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Foreword 



The present document describes the detailed mapping of the wideband telephony speech service employing the 
Adaptive Multi-Rate (AMR-WB) speech coder within the 3GPP system. 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying 
change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 Indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the specification; 
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Scope 



This Telecommunication Standard (TS) describes the detailed mapping from input blocks of 320 speech samples in 
16-bit uniform PCM format to encoded blocks of 132, 177, 253, 285, 317, 365, 397, 461 and 477 bits and from encoded 
blocks of 132, 177, 253, 285, 317, 365, 397, 461 and 477 bits to output blocks of 320 reconstructed speech samples. 
The sampling rate is 16 000 samples/s leading to a bit rate for the encoded bit stream of 6.60, 8.85, 12.65, 14.25, 15.85, 
18.25, 19.85, 23.05 or 23.85 kbit/s. The coding scheme for the multi-rate coding modes is the so-called Algebraic Code 
Excited Linear Prediction Coder, hereafter referred to as ACELP. The multi-rate wideband ACELP coder is referred to 
as MRWB-ACELP. 



2 Normative references 

This TS incorporates by dated and undated reference, provisions from other publications. These normative references 
are cited in the appropriate places in the text and the publications are listed hereafter. For dated references, subsequent 
amendments to or revisions of any of these publications apply to this TS only when incorporated in it by amendment or 
revision. For undated references, the latest edition of the publication referred to applies. 

[1] GSM 03.50: " Digital cellular telecommunications system (Phase 2); Transmission planning 

aspects of the speech service in the GSM Public Land Mobile Network (PLMN) system" 

[2] 3GPP TS 26.201 : "AMR wideband speech codec; Frame structure". 

[3] 3GPP TS 26. 194: "AMR wideband speech codec; Voice Activity Detection (VAD)". 

[4] 3GPP TS 26.173: "AMR wideband speech codec; ANSI-C code". 

[5] 3GPP TS 26.174: "AMR wideband speech codec; Test sequences". 

[6] ITU-T Recommendation G.71 1 (1988): "Coding of analogue signals by pulse code modulation 

Pulse code modulation (PCM) of voice frequencies". 



3 Definitions, symbols and abbreviations 

3.1 Definitions 

For the purposes of this TS, the following definitions apply: 

adaptive codebook: The adaptive codebook contains excitation vectors that are adapted for every sub frame. The 
adaptive codebook is derived from the long-term filter state. The lag value can be viewed as an index into the adaptive 
codebook. 

algebraic codebook: A fixed codebook where algebraic code is used to populate the excitation vectors (innovation 
vectors). The excitation contains a small number of nonzero pulses with predefined interlaced sets of potential 
positions. The amplitudes and positions of the pulses of the k* excitation codevector can be derived from its index k 
through a rule requiring no or minimal physical storage, in contrast with stochastic codebooks whereby the path from 
the index to the associated codevector involves look-up tables. 

anti-sparseness processing: An adaptive post-processing procedure applied to the fixed codebook vector in order to 
reduce perceptual artifacts from a sparse fixed codebook vector. 

closed-loop pitch analysis: This is the adaptive codebook search, i.e., a process of estimating the pitch (lag) value from 
the weighted input speech and the long term filter state. In the closed-loop search, the lag is searched using error 
minimization loop (analysis-by-synthesis). In the adaptive multi-rate wideband codec, closed-loop pitch search is 
performed for every sub frame. 

direct form coefficients: One of the formats for storing the short term filter parameters. In the adaptive multi-rate 
wideband codec, all filters which are used to modify speech samples use direct form coefficients. 
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fixed codebook: The fixed codebook contains excitation vectors for speech synthesis filters. The contents of the 
codebook are non-adaptive (i.e., fixed). In the adaptive muhi-rate wideband codec, the fixed codebook is implemented 
using an algebraic codebook. 

fractional lags: A set of lag values having sub-sample resolution. In the adaptive multi-rate wideband codec a 
sub-sample resolution of l/4th or l/2nd of a sample is used. 

frame: A time interval equal to 20 ms (320 samples at an 16 kHz sampling rate). 

Immittance Spectral Frequencies: (see Immittance Spectral Pair) 

Immittance Spectral Pair: Transformation of LPC parameters. Immittance Spectral Pairs are obtained by 
decomposing the inverse filter transfer function A(z) to a set of two transfer functions, one having even symmetry and 
the other having odd symmetry. The Immittance Spectral Pairs (also called as Immittance Spectral Frequencies) are the 
roots of these polynomials on the z-unit circle. 

integer lags: A set of lag values having whole sample resolution. 

interpolating filter: An FIR filter used to produce an estimate of sub-sample resolution samples, given an input 
sampled with integer sample resolution. In this implementation, the interpolating filter has low pass filter 
characteristics. Thus the adaptive codebook consists of the low-pass filtered interpolated past excitation. 

inverse filter: This filter removes the short term correlation from the speech signal. The filter models an inverse 
frequency response of the vocal tract. 

lag: The long term filter delay. This is typically the true pitch period, or its multiple or sub-multiple. 

LP analysis window: For each frame, the short term filter coefficients are computed using the high pass filtered speech 
samples within the analysis window. In the adaptive multi-rate wideband codec, the length of the analysis window is 
always 384 samples. For all the modes, a single asymmetric window is used to generate a single set of LP coefficients. 
The 5 ms look-ahead is used in the analysis. 

LP coefficients: Linear Prediction (LP) coefficients (also referred as Linear Predictive Coding (LPC) coefficients) is a 
generic descriptive term for the short term filter coefficients. 

mode: When used alone, refers to the source codec mode, i.e., to one of the source codecs employed in the AMR-WB 
codec. 

open-loop pitch search: A process of estimating the near optimal lag directly from the weighted speech input. This is 
done to simplify the pitch analysis and confine the closed-loop pitch search to a small number of lags around the 
open-loop estimated lags. In the adaptive multi-rate wideband codec, an open-loop pitch search is performed in every 
other sub frame. 

residual: The output signal resulting from an inverse filtering operation. 

short term synthesis filter: This filter introduces, into the excitation signal, short term correlation which models the 
impulse response of the vocal tract. 

perceptual weighting filter: This filter is employed in the analysis-by-synthesis search of the codebooks. The filter 
exploits the noise masking properties of the formants (vocal tract resonances) by weighting the error less in regions near 
the formant frequencies and more in regions away from them. 

subframe: A time interval equal to 5 ms (80 samples at 16 kHz sampling rate). 

vector quantization: A method of grouping several parameters into a vector and quantizing them simultaneously. 

zero input response: The output of a filter due to past inputs, i.e. due to the present state of the filter, given that an 
input of zeros is applied. 

zero state response: The output of a filter due to the present input, given that no past inputs have been applied, i.e., 
given that the state information in the filter is all zeroes. 

3.2 Symbols 

For the purposes of this TS, the following symbols apply: 
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A(z) 

^ ' The inverse filter with unquantized coefficients 

Mz) 

^ ' The inverse filter with quantized coefficients 

HiyZ) = -^ The speech synthesis filter with quantized coefficients 

A{z) 

Ui The unquantized linear prediction parameters (direct form coefficients) 

a I The quantified linear prediction parameters 

^ The order of the LP model 
Wiz) 

^ ' The perceptual weighting filter (unquantized coefficients) 

7i The perceptual weighting factor 

T The integer pitch lag nearest to the closed-loop fractional pitch lag of the subframe 

P The adaptive pre-filter coefficient (the quantified pitch gain) 

Hj^\{z) Pre-processing high-pass filter 

w(n) LP analysis window 

1 Length of the first part of the LP analysis window '^v"-' 

2 Length of the second part of the LP analysis window ^\^) 
r(k) The auto-correlations of the windowed speech s' (n) 

Wi^„ \i) Lag window for the auto -correlations (60 Hz bandwidth expansion) 

/o 

The bandwidth expansion in Hz 

f ^ The sampling frequency in Hz 

r' ik) The modified (bandwidth expanded) auto-correlations 

E\i) The prediction error in the Zth iteration of the Levinson algorithm 



h 



The /th reflection coefficient 



a ■ The 7th direct form coefficient in the Zth iteration of the Levinson algorithm 

F(\z) Symmetric ISF polynomial 

^2^Z) Antisymmetric ISF polynomial 

F^iz) Polynomial F{{z) 

F2\Z) Polynomial F2\Z) with roots z = land z=— 1 eliminated 
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The immittance spectral pairs (ISPs) in the cosine domain 
q An ISP vector in the cosine domain 



q(«) 



COi 



The quantified ISP vector at the /th subframe of the frame n 



The immittance spectral frequencies (ISFs) 

T^^ {x) A mth order Chebyshev polynomial 

/l (05/2(0 The coefficients of the polynomials _Fj(z) and -/^(z) 

/l (0^/2(0 The coefficients of the polynomials F(\z) and F2\Z) 

f (l) The coefficients of either F-^\z) or F2\z) 

C\ x) Sum polynomial of the Chebyshev polynomials 

X Cosine of angular frequency CO 

^•^ R.c„.,„„ c„emc.e„,s f„, ^. Chebyshev p„ly„„n,.al evalu.,.„„ 

f. 

' The immittance spectral frequencies (ISFs) in Hz 

f ' =[/i /2 • • -/is ] The vector representation of the ISFs in Hz 

z(n) The mean-removed ISF vector at frame n 

Y\n) The ISF prediction residual vector at frame n 

p(n) The predicted ISF vector at frame n 

Y\n — 1) The quantified residual vector at the past frame 

f; The quantified ISF subvector ; at quantization index k 

d- f- f- 

' The distance between the immittance spectral frequencies '+1 and '~^ 

h\n) The impulse response of the weighted synthesis filter 

H(z}W{z) The weighted synthesis filter 

Z[ The integer nearest to the fractional pitch lag of the previous (1st or 3rd) subframe 

s' (n) The windowed speech signal 

S-^^,[n) The weighted speech signal 

s{n) Reconstructed speech signal 

x[n) The target signal for adaptive codebook search 

XjXn) Xj The target signal for algebraic codebook search 
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reSj^p{n) The LP residual signal 

c\n) The fixed codebook vector 

V\n) The adaptive codebook vector 

y{n) = v(n)* h(n) The filtered adaptive codebook vector 



u{n) 
u'{n) 



'op 



R{k) 

\ 

H 

0=H'H 

d{n) 
<PiiJ) 

C 

mi 

N„ 



The past filtered excitation 

The excitation signal 

The gain-scaled emphasized excitation signal 

The best open-loop lag 

Minimum lag search value 
Maximum lag search value 

Correlation term to be maximized in the adaptive codebook search 
The interpolated value of ^ ' for the integer delay k and fraction t 

Correlation term to be maximized in the algebraic codebook search at index k 
The correlation in the numerator of ^ at index k 
The energy in the denominator of ^ at index k 



The correlation between the target signal X2(n) and the impulse response /z(n), i.e., backward 
filtered target 

The lower triangular Toepliz convolution matrix with diagonal /z(o) and lower diagonals 

h{\),...,h{63) 

The matrix of correlations of ^ ' 
The elements of the vector d 
The elements of the symmetric matrix O 
The innovation vector 

The correlation in the numerator of ^ 

The position of the /th pulse 

The amplitude of the Zth pulse 

The number of pulses in the fixed codebook excitation 



£75/ 



3GPP TS 26.1 90 version 8.0.0 Release 8 1 1 ETSI TS 1 26 1 90 V8.0.0 (2009-01 ) 

Eq The energy in the denominator of ^ 

reSup{n) The normalized long-term prediction residual 

b\n) The signal used for presetting the signs in algebraic codebook search 

Sjy\n) The sign signal for the algebraic codebook search 

d '\n) Sign extended backward filtered target 

(j) (i,j) The modified elements of the matrix O , including sign information 

Z , z{n) The fixed codebook vector convolved with h{n) 

E\n) The mean-removed innovation energy (in dB) 

E The mean of the innovation energy 

E\n) The predicted energy 

LI 2 i 4 J 'pjjg ]y[^ prediction coefficients 

R\k) The quantified prediction error at subframe k 

Ej The mean innovation energy 

R{n) The prediction error of the fixed-codebook gain quantization 

Eq The quantization error of the fixed-codebook gain quantization 

e{n) The states of the synthesis filter 1/ A(z) 

e^y\n) The perceptually weighted error of the analysis-by-synthesis search 

T] The gain scaling factor for the emphasized excitation 

g^ The fixed-codebook gain 

g^ The predicted fixed-codebook gain 

g^ The quantified fixed codebook gain 

g The adaptive codebook gain 

g The quantified adaptive codebook gain 

y ac ~ Sc I S'c ^ correction factor between the gain g^ and the estimated one g'^ 

y„^. The optimum value for y „^ 



Ts 



sc 



Gain scaling factor 
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3.3 



Abbreviations 



For the purposes of this TS, the following abbreviations apply. 

ACELP Algebraic Code Excited Linear Prediction 

AGC Adaptive Gain Control 

AMR Adaptive Multi-Rate 

AMR-WB Adaptive Multi-Rate Wideband 

CELP Code Excited Linear Prediction 

FIR Finite Impulse Response 

ISF Immittance Spectral Frequency 

ISP Immittance Spectral Pair 

ISPP Interleaved Single -Pulse Permutation 

LP Linear Prediction 

LPC Linear Predictive Coding 

LTP Long Term Predictor (or Long Term Prediction) 

MA Moving Average 

MRWB-ACELP Wideband Multi-Rate ACELP 

S-MSVQ Split-MultiStage Vector Quantization 

WB Wideband 



Outline description 



This TS is structured as follows: 

Section 4. 1 contains a functional description of the audio parts including the A/D and D/A functions. Section 4.2 
describes input format for the AMR-WB encoder and the output format for the AMR-WB decoder. Sections 4.3 and 4.4 
present a simplified description of the principles of the AMR-WB codec encoding and decoding process respectively. In 
subclause 4.5, the sequence and subjective importance of encoded parameters are given. 

Section 5 presents the functional description of the AMR-WB codec encoding, whereas clause 6 describes the decoding 
procedures. In section 7, the detailed bit allocation of the AMR-WB codec is tabulated. Section 8 describes the homing 
operation. 

4.1 Functional description of audio parts 

The analogue -to-digital and digital-to-analogue conversion will in principle comprise the following elements: 

1) Analogue to uniform digital PCM 

microphone; 

input level adjustment device; 
input anti-aliasing filter; 
sample-hold device sampling at 16 kHz; 

analogue-to-uniform digital conversion to 14-bit representation. 
The uniform format shall be represented in two's complement. 

2) Uniform digital PCM to analogue 

conversion from 14-bit/16 kHz uniform PCM to analogue; 
a hold device; 

reconstruction filter including x/sin( x ) correction; 
output level adjustment device; 
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earphone or loudspeaker. 
In the terminal equipment, the A/D function may be achieved 
by direct conversion to 14-bit uniform PCM format; 
For the D/A operation, the inverse operations take place. 

4.2 Preparation of speech samples 

The encoder is fed with data comprising of samples with a resolution of 14 bits left justified in a 16-bit word. The 
decoder outputs data in the same format. Outside the speech codec further processing must be applied if the traffic data 
occurs in a different representation. 

4.3 Principles of the adaptive multi-rate wideband speech 
encoder 



The AMR-WB codec consists of nine source codecs with bit-rates of 23.85 23.05, 19.85, 18.25, 15.85, 14.25, 12.65, 
8.85 and 6.60 kbit/s. 

The codec is based on the code-excited linear predictive (CELP) coding model. The input signal is pre-emphasized 
using the filter //pre-smpA(z)=l-M2~'- The CELP model is then applied to the pre-emphasized signal. A 16th order linear 
prediction (LP), or short-term, synthesis filter is used which is given by: 



Hiz) 



1 



1 



A{z) 






(1) 



where di,i=l,...,m are the (quantized) linear prediction (LP) parameters, and m = 16 is the predictor order. The 
long-term, or pitch, synthesis filter is usually given by: 



1 



1 



Biz) l-g„z-^' 



(2) 



where T is the pitch delay and gp is the pitch gain. The pitch synthesis filter is implemented using the so-called adaptive 
codebook approach. 

The CELP speech synthesis model is shown in Figure 1. In this model, the excitation signal at the input of the 
short-term LP synthesis filter is constructed by adding two excitation vectors from adaptive and fixed (innovative) 
codebooks. The speech is synthesized by feeding the two properly chosen vectors from these codebooks through the 
short-term synthesis filter. The optimum excitation sequence in a codebook is chosen using an analysis-by-synthesis 
search procedure in which the error between the original and synthesized speech is minimized according to a 
perceptually weighted distortion measure. 



The perceptual weighting filter used in the analysis-by-synthesis search technique is given by: 



W(z) = A(z/rOH, 



de—emph 



iz). 



(3) 



where A(z) is the unquantized LP filter, Hj^_^^p^ 
weighting filter uses the unquantized LP parameters. 



1 



1-0.68Z" 



and Yj=0.92 is the perceptual weighting factor. The 



The encoder performs the analysis of the LPC, LTP and fixed codebook parameters at 12.8 kHz sampling rate. The 
coder operates on speech frames of 20 ms. At each frame, the speech signal is analysed to extract the parameters of the 
CELP model (LP filter coefficients, adaptive and fixed codebooks' indices and gains). In addition to these parameters, 
high-band gain indices are computed in 23.85 kbit/s mode. These parameters are encoded and transmitted. At the 
decoder, these parameters are decoded and speech is synthesized by filtering the reconstructed excitation signal through 
the LP synthesis filter. 
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The signal flow at the encoder is shown in Figure 2. After decimation, high-pass and pre -emphasis filtering is 
performed. LP analysis is performed once per frame. The set of LP parameters is converted to immittance spectrum 
pairs (ISP) and vector quantized using split-multistage vector quantization (S-MSVQ). The speech frame is divided into 
4 subframes of 5 ms each (64 samples at 12.8 kHz sampling rate). The adaptive and fixed codebook parameters are 
transmitted every subframe. The quantized and unquantized LP parameters or their interpolated versions are used 
depending on the subframe. An open-loop pitch lag is estimated in every other subframe or once per frame based on the 
perceptually weighted speech signal. 

Then the following operations are repeated for each subframe: 

The target signal x(n) is computed by filtering the LP residual through the weighted synthesis filter W{z)h{z) 
with the initial states of the filters having been updated by filtering the error between LP residual and excitation 
(this is equivalent to the common approach of subtracting the zero input response of the weighted synthesis filter 
from the weighted speech signal). 

The impulse response, h(n) of the weighted synthesis filter is computed. 

Closed-loop pitch analysis is then performed (to find the pitch lag and gain), using the target x(n) and impulse 
response h(n), by searching around the open-loop pitch lag. Fractional pitch with l/4th or l/2nd of a sample 
resolution (depending on the mode and the pitch lag value) is used. The interpolating filter in fractional pitch 
search has low pass frequency response. Further, there are two potential low-pass characteristics in the the 
adaptive codebook and this information is encoded with 1 bit. 

The target signal x(n) is updated by removing the adaptive codebook contribution (filtered adaptive codevector), 
and this new target, X2(n), is used in the fixed algebraic codebook search (to find the optimum innovation). 

The gains of the adaptive and fixed codebook are vector quantified with 6or 7 bits (with moving average (MA) 
prediction applied to the fixed codebook gain). 

Finally, the filter memories are updated (using the determined excitation signal) for finding the target signal in 
the next subframe. 

The bit allocation of the AMR-WB codec modes is shown in Table 1. In each 20 ms speech frame, 132, 177, 253, 285, 
317, 365, 397, 461 and 477 bits are produced, corresponding to a bit-rate of 6.60, 8.85 ,12.65, 14.25, 15.85, 18.25, 
19.85, 23.05 or 23.85 kbit/s. More detailed bit allocation among the codec parameters is given in tables 12a-12i. Note 
that the most significant bits (MSB) are always sent first. 
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Table 1 : Bit allocation of the AMR-WB coding algorithm for 20 ms frame 



Mode 


Parameter 


Istsubframe 


2nd subframe 


3rd subframe 


4th subframe 


total per frame 


23.85 kbit/s 


VAD-flag 










1 


ISP 










46 


LTP-filtering 


1 


1 


1 


1 


4 


Pitch delay 


9 


6 


9 


6 


30 


Algebraic code 


88 


88 


88 


88 


352 


Codebook gain 


7 


7 


7 


7 


28 


HB-energy 


4 


4 


4 


4 


16 


Total 


477 1 


23.05 kbit/s 


VAD-flag 










1 


ISP 










46 


LTP-filtering 


1 


1 


1 


1 


4 


Pitch delay 


9 


6 


9 


6 


30 


Algebraic code 


88 


88 


88 


88 


352 


Gains 


7 


7 


7 


7 


28 


Total 










461 


19.85 kbit/s 


VAD-flag 










1 


ISP 










46 


LTP-filtering 


1 


1 


1 


1 


4 


Pitch delay 


9 


6 


9 


6 


30 


Algebraic code 


72 


72 


72 


72 


288 


Codebook gain 


7 


7 


7 


7 


28 


Total 


397 1 


18.25 kbit/s 


VAD-flag 










1 


ISP 










46 


LTP-filtering 


1 


1 


1 


1 


4 


Pitch delay 


9 


6 


9 


6 


30 


Algebraic code 


64 


64 


64 


64 


256 


Gains 


7 


7 


7 


7 


28 


Total 


365 1 


15.85 kbit/s 


VAD-flag 










1 


ISP 










46 


LTP-filtering 


1 


1 


1 


1 


4 


Pitch delay 


9 


6 


9 


6 


30 


Algebraic code 


52 


52 


52 


52 


208 


Gains 


7 


7 


7 


7 


28 


Total 


317 1 


14.25 kbit/s 


VAD-flag 










1 


ISP 










46 


LTP-filtering 


1 


1 


1 


1 


4 


Pitch delay 


9 


6 


9 


6 


30 


Algebraic code 


44 


44 


44 


44 


176 


Gains 


7 


7 


7 


7 


28 


Total 


285 1 


12.65 kbit/s 


VAD-flag 










1 


ISP 










46 


LTP-filtering 


1 


1 


1 


1 


4 


Pitch delay 


9 


6 


9 


6 


30 


Algebraic code 


36 


36 


36 


36 


144 


Gains 


7 


7 


7 


7 


28 


Total 




253 


8.85 kbit/s 


VAD-flag 










1 


ISP 










46 


Pitch delay 


8 


5 


8 


5 


26 


Algebraic code 


20 


20 


20 


20 


80 


Gains 


6 


6 


6 


6 


24 


Total 




177 


6.60 kbit/s 


VAD-flag 










1 


ISP 










36 


Pitch delay 


8 


5 


5 


5 


23 


Algebraic code 


12 


12 


12 


12 


48 


Gains 


6 


6 


6 


6 


24 


Total 




132 
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4.4 Principles of the adaptive multi-rate speech decoder 

The signal flow at the decoder is shown in Figure 3. At the decoder, the transmitted indices are extracted from the 
received bitstream. The indices are decoded to obtain the coder parameters at each transmission frame. These 
parameters are the ISP vector, the 4 fractional pitch lags, the 4 LTP filtering parameters, the 4 innovative codevectors, 
and the 4 sets of vector quantized pitch and innovative gains. In 23.85 kbit/s mode, also high-band gain index is 
decoded. The ISP vector is converted to the LP filter coefficients and interpolated to obtain LP filters at each subframe. 
Then, at each 64-sample subframe: 

The excitation is constructed by adding the adaptive and innovative codevectors scaled by their respective gains. 

The 12.8 kHz speech is reconstructed by filtering the excitation through the LP synthesis filter. 

The reconstructed speech is de-emphasized. 

Finally, the reconstructed speech is upsampled to 16 kHz and high-band speech signal is added to the frequency band 
from 6 kHz to 7 kHz. 

4.5 Sequence and subjective importance of encoded 
parameters 

The encoder will produce the output information in a unique sequence and format, and the decoder must receive the 
same information in the same way. In table 12a-12i, the sequence of output bits and the bit allocation for each 
parameter is shown. 

The different parameters of the encoded speech and their individual bits have unequal importance with respect to 
subjective quality. The output and input frame formats for the AMR wideband speech codec are given in [2], where a 
reordering of bits take place. 



5 Functional description of the encoder 

In this clause, the different functions of the encoder represented in Figure 2 are described. 



5.1 Pre-processing 



The encoder performs the analysis of the LPC, LTP and fixed codebook parameters at 12.8 kHz sampling rate. 
Therefore, the input signal has to be decimated from 16 kHz to 12.8 kHz. The decimation is performed by first 
upsampling by 4, then filtering the output through lowpass FIR filter Hj^^iJz} that has the cut off frequency at 6.4 kHz. 
Then, the signal is downsampled by 5. The filtering delay is compensated by adding zeroes into the end of the input 
vector. 

After the decimation, two pre-processing functions are applied to the signal prior to the encoding process: high-pass 
filtering and pre-emphasizing (and signal down-scaling). 

(Down-scaling consists of dividing the input by a factor of 2 to reduce the possibility of overflows in the fixed-point 
implementation.) 

The high-pass filter serves as a precaution against undesired low frequency components. A filter at a cut off frequency 
of 50 Hz is used, and it is given by 

„ ,, 0.989502 -1.979004z"'+0.989502z"^ 

Hhi(^) = ] 7 • (4) 

1 - 1.978882Z"' + 0.979126z"^ 

(Both down-scaling and high-pass filtering are combined by dividing the coefficients at the numerator of H,^i(z) by 2.) 

In the pre-emphasis, a first order high-pass filter is used to emphasize higher frequencies, and it is given by 

H pre-empk{z)=l-OMz-' (5) 
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5.2 Linear prediction analysis and quantization 

Short-term prediction, or LP, analysis is performed once per speech frame using the autocorrelation approach with 30 
ms asymmetric windows. An overhead of 5 ms is used in the autocorrelation computation. The frame structure is 
depicted below. 







r 1 
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frame n 

(4 X 5 ms 


1 1 
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The autocorrelations of windowed speech are converted to the LP coefficients using the Levinson-Durbin algorithm. 
Then the LP coefficients are transformed to the ISP domain for quantization and interpolation purposes. The 
interpolated quantized and unquantized filters are converted back to the LP filter coefficients (to construct the synthesis 
and weighting filters at each subframe). 

5.2.1 Windowing and auto-correlation computation 

LP analysis is performed once per frame using an asymmetric window. The window has its weight concentrated at the 
fourth subframe and it consists of two parts: the first part is a half of a Hamming window and the second part is a 
quarter of a Hamming-cosine function cycle. The window is given by: 



w(«) = 0.54 - 0.46 cos ^^ 
l2L,-l 

f2;r(«-Li) 
= cos ■ 



, « = 0,...,L -1, 

« — L, ,...,L, +L-y —1 
l^ 4L2-1 ' 

where the values Li=256 and L2=128 are used. 

The autocorrelations of the windowed speech s'(n),n=0,...,383 are computed by 

383 



(6) 



r{k) = Y_^s'{n)s\n-k), fe = 0,...,16. 



(7) 



n—k 



and a 60 Hz bandwidth expansion is used by lag windowing the autocorrelations using the window [2] 

.^2" 



>VteO') = exp 



fs 



!=1,...16, 



(8) 



where /o=60 Hz is the bandwidth expansion and/5=72S00 Hz is the sampling frequency. Further, r(0) is multiplied by 
the white noise correction factor 1.0001 which is equivalent to adding a noise floor at -40 dB. 



5.2.2 Levinson-Durbin algorithm 



The modified autocorrelations r'(0) = LOOOlr(O) and r'{k) = r{k)wi {k),k = \,...\6, are used to obtain the LP filter 
coefficients ay,k=l,...,16 by solving the set of equations. 

16 



^a^r'(|i-A:|j=-r'(0, i = 1,...,16. 



(9) 



k=\ 



The set of equations in (9) is solved using the Levinson-Durbin algorithm [2]. This algorithm uses the following 
recursion: 



£75/ 



3GPP TS 26.1 90 version 8.0.0 Release 8 1 8 ETSI TS 1 26 1 90 V8.0.0 (2009-01 ) 



£(0) = r'(0) 

For / = 1 to 16 do 







/E(i-l) 



For J = 1 to / - 1 do 
E(i) = (\-kf)E(i-l) 



The final solution is given as a = a , ' ,j=l,...,16. 

The LP filter coefficients are converted to the ISP representation [4] for quantization and interpolation purposes. The 
conversions to the ISP domain and back to the LP filter domain are described in the next two sections. 

5.2.3 LP to ISP conversion 

The LP filter coefficients a^ k=l,...,16, are converted to the ISP representation for quantization and interpolation 
purposes. For a 16th order LP filter, the ISPs are defined as the roots of the sum and difference polynomials 

/i(z) = A(z) + z-"'A(z-') (10) 



and 



/2(z) = A(z)-z-'^A(z-') (11) 



respectively. (The polynomials /yfz) and/2(zj are symmetric and antisymmetric, respectively). It can be proven that all 
roots of these polynomials are on the unit circle and they alternate each other [5].f2(z) has two roots at z = 1 {a>=Q) and 
z = -1 (<»= TT). To eliminate these two roots, we define the new polynomials 

/i(z) = /i(z) (12) 

and 

/2(Z) = /2(Z)/(1-Z-'). (13) 

Polynomials /yfzj and/2('z) have 8 and 7 conjugate roots on the unit circle ie~^'"' J respectively. Therefore, the 
polynomials can be written as 

Fi(z) = (l + fl[l6]) Yl(l-2q,z-' + z-^) (14) 

1=0,2,. ..,14 

and 



F2(z) = (l-a[l6]) Yl(^-2q,z-' + z-^) (15) 



1=1,3,. ..,13 

where qi=cos(cOi) with CO, being the immittance spectral frequencies (ISF) and fl[16] is the last predictor coefficient. ISFs 
satisfy the ordering property Q< co^ < CO2 < ...< CO^^ < n . We refer to q^ as the ISPs in the cosine domain. 

Since both polynomials /ifz) and/2('zj are symmetric only the first 8 and 7 coefficients of each polynomial, respectively, 
and the last predictor coefficient need to be computed. 

The coefficients of these polynomials are found by the recursive relations 

for (=0 to 7 



£75/ 



3GPP TS 26.1 90 version 8.0.0 Release 8 1 9 ETSI TS 1 26 1 90 V8.0.0 (2009-01 ) 

/i(0 = a,. +a„_,., 

( io ) 

/2(/)= a, -a„_,+/2(/-2). 

/,(8) = 2fl8 

where m=16 is the predictor order, and /2(-2) = /2(-l) = . 

The ISPs are found by evaluating the polynomials Fi(z) and F2(z) at 100 points equally spaced between and tt and 
checking for sign changes. A sign change signifies the existence of a root and the sign change interval is then divided 
4 times to better track the root. The Chebyshev polynomials are used to evaluate F](z) and F2(z) [6]. In this method the 

roots are found directly in the cosine domain {(7,}. The polynomials F^fzj andF2(z) evaluated atz = e-"^ can be written 

as 

Fi((y) = 2e"^'*%(x) and F2(0)) = 2e-J'^'^C2(x) (17) 

with 

7 6 

Ci(x) = ^/,(0r8_,.(x) + /i(8)/2, and C^ix) = J^f2(i)T,_,(x) + f^iDI^, ( 18 ) 

i=0 (=0 

where r„=cos(m(y) is the mth order Chebyshev polynomial, /f/j are the coefficients of either Fi(z) or F2(z), computed 
using the equations in (16). The polynomial C(x) is evaluated at a certain value of x = cos((y) using the recursive 
relation; 

for k = nf-\ down to 1 

bk =2A+i-^*+2 + /(«/-^) 
end 
C{x) = xb^ -b2+ f{nj )ll, 

where n/=8 in case of Ci(x) and n/=7 in case of C2(x), with initial values fenf=/(0) and fonf+i=0. The details of the 
Chebyshev polynomial evaluation method are found in [6] . 

5.2.4 ISP to LP conversion 

Once the ISPs are quantized and interpolated, they are converted back to the LP coefficient domain \a^. } . The 

conversion to the LP domain is done as follows. The coefficients ofFiiz) and F2(z) are found by expanding Equations 
(14) and (15) knowing the quantized and interpolated ISPs ^,=,;=0,...,m-l, where m=16. The following recursive 
relation is used to compute /ifz) 

for / = 2 to m/2 

/i(/) = -2^2,_2/i('-l) + 2/i(/-2) 
for j = i-1 down to 2 

/i( J) = fiU) - 2fe_2/i(i - 1) + /i(i - 2) 
end 

/l(l) = /l(l)-2fe_2 

end 

with initial values /i(0)=l and/i(l)=-2(7o- The coefficients /2(/) are computed similarly by replacing ^21-2 by ^2i-i and m/2 
by m/2-1, and with intial conditions /2(0)=1 and/2(l)=-2^i. 

Once the coefficients /ifz) andf2(z) are found, F2(z) is multiplied by 1-z"^, to obtain F'2{z)', that is 

/2(/)=/2(/)-/2(/-2), / = 2,..., m/2-1, ^^^^ 

/i(0 = /i(0 i = 0,...,ml2 
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Then F'i(z) and F'2(z) are multiplied by l+^'m-i and l-^'m-i, respectively. That is 

/2(0=(l-?„-i)/'2(0, i = 0,...,m/2-l, 
/;(/) = (l + ^„,.i)/'i(/) i = 0,...,m/2 

Finally the LP coefficients are found by 

fl .= 0.5/i' (/) + 0.5/2 (/) , ! = 1, . . . , m / 2 - 1, 

0.5/1 (0-0.5/2(0, / = m/2 + l,...,m-l, ^2^^ 

0.5/1' (m/ 2), i = m/2, 

This is directly derived from the relation A{z) = (F^ (z) + F2(z))/2, and considering the fact that F'i(z) and F'2(z) are 
symmetric and antisymmetric polynomials, respectively. 

5.2.5 Quantization of tine ISP coefficients 

The LP filter coefficients are quantized using the ISP representation in the frequency domain; that is 

f 

/,. = ^^^ arccos(fir . ), / = 0, ... 14, 

( 21 ) 

= ^-arccos(<7. ), / = 15, 

where/ are the ISFs in Hz [0,6400] and /= 12800 is the sampling frequency. The ISP vector is given by f ' = 
|/(/i,.../i5], with t denoting transpose. 

A 1st order MA prediction is applied, and the residual ISF vector is quantified using a combination of split vector 
quantization (SVQ) and multi-stage vector quantization (MSVQ). The prediction and quantization are performed as 
follows. Let z(n) denote the mean-removed ISF vector at frame n . The prediction residual vector r(n) is given by: 

r(n)=z(n)-p(n) (22) 

where p(n) is the predicted LSF vector at frame n. First order moving-average (MA) prediction is used where: 

p(«) = ^f(«-l), (23) 

where f (n - 1) is the quantized residual vector at the past frame. 

The ISF residual vector r is quantized using split-multistage vector quantization S-MSVQ. The vector is split into 2 
subvectors r|(n) and r2(n) of dimensions 9 and 7, respectively. The 2 subvectors are quantized in two stages. In the first 
stage ri(n) is quantized with 8 bits and rjin) with 8 bits. 

For 8.85 ,12.65, 14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s modes, the quantization error vectors 

r,- = T- — T- , / = 1,2 are split in the next stage into 3 and 2 subvectors, respectively. The subvectors are quantized 

using the bit-rates described in Table 2. 
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Table 2. Quantization of ISP vector for the 8.85 ,12.65, 14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s 

modes 



1. UNQUANTIZED 16-ELEMENT-LONG ISP VECTOR 


2.STAGEl(ri)8bits 


2.STAGEl(r2)8bits 


3. STAGE 2 

(r<^\,o-2) 

6 bits 


3. STAGE 2 

(r^^'i,3-5) 

7 bits 


3. STAGE 2 
7 bits 


3. STAGE 2 
5 bits 


3. STAGE 2 
5 bits 



For 6.60 kbit/s mode, the quantization error vectors r^ = r^ — f ^ , i — 1,2 are split in the next stage into 2 and 1 
subvectors, respectively. The subvectors are quantized using the bit-rates described in Table 3. 

Table 3. Quantization of ISP vector for the 6.60 kbit/s mode 



1. UNQUANTIZED 16-ELEMENT-LONG ISP VECTOR 


2.STAGEl(ri)8bits 


2.STAGEl(r2)8bits 


3. STAGE 2 

(r<'\,o-4) 

7 bits 


3. STAGE 2 

(r^^',,5-8) 

7 bits 


3. STAGE 2 

(r^^^ 2,0-6) 

6 bits 



A squared error ISP distortion measure is used in the quantization process. In general, for an input ISP or error residual 
subvector r„/=l,2 and a quantized vector at index k, r, , the quantization is performed by finding the index k which 
minimizes 



n J- 



(24) 



where m and n are the first and last elements of the subvector. 

5.2.6 Interpolation of the ISPs 

The set of quantized (and unquantized) LP parameters is used for the fourth subframe whereas the first, second, and 
third subframes use a linear interpolation of the parameters in the adjacent frames. The interpolation is performed on the 
ISPs in the q domain. Let q*j"' be the ISP vector at the 4th subframe of the frame, and 44""'' the ISP vector at the 4th 
subframe of the past frame n-\. The interpolated ISP vectors at the 1st, 2nd, and 3rd subframes are given by 



qS"^ 



:0.55q^"-"+0.45q^"\ 



<">-0.2q^"-"+0.8q^"'. 



^2 
13 



:0.04q<j"-"+0.96q!j"\ 



The same formula is used for interpolation of the unquantized ISPs. The interpolated ISP vectors are used to compute a 
different LP filter at each subframe (both quantized and unquantized) using the ISP to LP conversion method described 
in Section 5.2.4. 



5.3 Perceptual weighting 



The traditional perceptual weighting filter l¥(z) = A{z I Y^)l A{zl Y2) has inherent limitations in modelling the 
formant structure and the required spectral tilt concurrently. The spectral tilt is more pronounced in wideband signals 
due to the wide dynamic range between low and high frequencies. A solution to this problem is to introduce the 
preemphasis filter at the input, compute the LP filter A(z) based on the preemphasized speech s{n), and use a modified 
filter W{z) by fixing its denominator. This structure substantially decouples the formant weighting from the tilt. 
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A weighting filter of the form W(z) = A{zl 7\)H ^^_g^p^{z) is used, where H^^_^^pp^ = — and y5i=0.68. 

Because A{z) is computed based on the preemphasized speech signal s{n), the tilt of the filter l/A(z/yi) is less 
pronounced compared to the case when A{z) is computed based on the original speech. Since deemphasis is performed 
at the decoder end, it can be shown that the quantization error spectrum is shaped by a filter having a transfer function 
W''{z)Hde-emph(z)=l/Mz/Yi)- Thus, the spectrum of the quantization error is shaped by a filter whose transfer function is 
l/A(z/yi), with A(z) computed based on the preemphasized speech signal. 



5.4 Open-loop pitch analysis 



Depending on the mode, open-loop pitch analysis is performed once per frame (each 10 ms) or twice per frame (each 10 
ms) to find two estimates of the pitch lag in each frame. This is done in order to simplify the pitch analysis and confine 
the closed loop pitch search to a small number of lags around the open-loop estimated lags. 

Open-loop pitch estimation is based on the weighted speech signal s^^ (n) which is obtained by filtering the input 

speech signal through the weighting filter W{z) = A{zl y\)H j^_^,„p,,{z) , where H^^_^^pf^ = — and y5i=0.68. That 

is, in a subframe of size L, the weighted speech is given by 

16 

sjn) = s(n) + Y^ajis(n - i) + P^s„(n - \),n = 0,...,L- 1. ( 25 ) 

/=i 

The open-loop pitch analysis is performed to a signal decimated by two. The decimated signal is obtained by filtering 
s„(n) through a fourth order FIR filter Hjgj,jjj^(z) and then downsampling the output by two to obtain the signal 

5.4.1 6.60 kbit/s mode 

Open-loop pitch analysis is performed once per frame (every 20 ms) to find an estimate of the pitch lag in each frame. 

The open-loop pitch analysis is performed as follows. First, the correlation of decimated weighted speech is determined 
for each pitch lag value d by: 

128 

C{d)^Y.'«An)s„An-d)w{d)4^ll,...,n5, (26) 

where w(d) is a weighting function. The estimated pitch-lag is the delay that maximises the weighted correlation 
function C(d). The weighting emphasises lower pitch lag values reducing the likelihood of selecting a multiple of the 
correct delay. The weighting function consists of two parts: a low pitch lag emphasis function, Wi(d), and a previous 
frame lag neighbouring emphasis function, w„(dy. 

wid)=wiid)w„id). (27) 

The low pitch lag emphasis function is a given by: 

w,{d) = cw{d) (28) 

where cw(d) is defined by a table in the fixed point computational description. The previous frame lag neighbouring 
emphasis function depends on the pitch lag of previous speech frames: 

w„(^)=H^--'l"'^'^' ^>°-'' (29) 

1 .0, otherwise. 
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where Tgi^ is the median filtered pitch lag of 5 previous voiced speech half-frames and v is an adaptive parameter. If the 
frame is classified as voiced by having the open-loop gain ^>0.6, then the V-value is set to 1.0 for the next frame. 
Otherwise, the V-value is updated by v=0.9v. The open loop gain is given by: 

127 

^^= (30) 



127 127 



lY^^ld (n)X*'v^ (" " "^^x ) 



y n=0 «=0 

where dmax is the pitch delay that maximizes C(d). The median filter is updated only during voiced speech frames. The 
weighting depends on the reliability of the old pitch lags. If previous frames have contained unvoiced speech or silence, 
the weighting is attenuated through the parameter v. 

5.4.2 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05 and 23.85 kbit/s 
modes 

Open-loop pitch analysis is performed twice per frame (every 10 ms) to find two estimates of the pitch lag in each 
frame. 

The open-loop pitch analysis is performed as follows. First, the correlation of decimated weighted speech is determined 
for each pitch lag value d by: 

63 



C{d)=Y,'^„An>„An-d)w[d)4 = \l,...,n5, (31) 



n=0 

where w(d) is a weighting function. The estimated pitch-lag is the delay that maximises the weighted correlation 
function C(d). The weighting emphasises lower pitch lag values reducing the likelihood of selecting a multiple of the 
correct delay. The weighting function consists of two parts: a low pitch lag emphasis function, Wi(d), and a previous 
frame lag neighbouring emphasis function, w„(dy. 

w{d)=wi{d)w„{d). (32) 

The low pitch lag emphasis function is given by: 

w,{d) = cw{d) (33) 

where cw(d) is defined by a table in the fixed point computational description. The previous frame lag neighbouring 
emphasis function depends on the pitch lag of previous speech frames: 

w„(^)=H^--'l"'^'^' ^>'-'' (34) 

[ 1.0, otherwise, 

where Tgu is the median filtered pitch lag of 5 previous voiced speech half-frames and v is an adaptive parameter. If the 
frame is classified as voiced by having the open-loop gain ^>0.6, then the V-value is set to 1.0 for the next frame. 
Otherwise, the v-value is updated by v=0.9v. The open loop gain is given by: 



63 

^s„a{n)s„j{n-d^^) 



n=0 



63 63 



(35) 



l^'^ld in)'^'<ld (n - ^max ) 



I n=Q n=0 



where d„ax is the pitch delay that maximizes C(d). The median filter is updated only during voiced speech frames. The 
weighting depends on the reliability of the old pitch lags. If previous frames have contained unvoiced speech or silence, 
the weighting is attenuated through the parameter v. 
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5.5 Impulse response computation 



The impulse response, h(n), of the weighted synthesis filter H(z)W(z) = A{zl Yi)Hj^_ ^^{z)! A{z) is computed each 

subframe. This impulse response is needed for the search of adaptive and fixed codebooks. The impulse response h(n) 
is computed by filtering the vector of coefficients of the filter A(z/j4) extended by zeros through the two filters 
1/A(z)and //rf,_,„^;, (z) . 



5.6 Target signal computation 



The target signal for adaptive codebook search is usually computed by subtracting the zero-input response of the 
weighted synthesis filter H{z)W(z) = A(z/ J'i)//^g_g,„p;,(z)/A(z) from the weighted speech signal s^(n) . This is 
performed on a subframe basis. 

An equivalent procedure for computing the target signal, which is used in this codec, is the filtering of the LP residual 

signal r(n) through the combination of synthesis filter 1/A(z) and the weighting filter Mzl Yi)H^g_g^pi^{z) ■ After 

determining the excitation for the subframe, the initial states of these filters are updated by filtering the difference 
between the LP residual and excitation. The memory update of these filters is explained in Section 5.10. 

The residual signal r(n) which is needed for finding the target vector is also used in the adaptive codebook search to 
extend the past excitation buffer. This simplifies the adaptive codebook search procedure for delays less than the 
subframe size of 64 as will be explained in the next section. The LP residual is given by 



lU 

r{n) = s{n) + 2,^i^(^ ~0y^ = 0,...,63. ( 36 ) 



5.7 Adaptive codebook 



Adaptive codebook search is performed on a subframe basis. It consists of performing closed loop pitch search, and 
then computing the adaptive codevector by interpolating the past excitation at the selected fractional pitch lag. 

The adaptive codebook parameters (or pitch parameters) are the delay and gain of the pitch filter. In the search stage, 
the excitation is extended by the LP residual to simplify the closed-loop search. 

In 12.65, 14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s modes, in the first and third subframes, a fractional pitch 

3 1 

delay is used with resolutions 1/4 in the range[34, 127 — ], resolutions 1/2 in the range [128, 159 — ], and integers only 

in the range [160, 231]. For the second and fourth subframes, a pitch resolution of 1/4 is always used in the range [ri-8 

3 
Ti+7 — ], where Ti is nearest integer to the fractional pitch lag of the previous (1st or 3rd) subframe. 

In 8.85 kbit/s mode, in the first and third subframes, a fractional pitch delay is used with resolutions 1/2 in the range 
[34, 91 — ], and integers only in the range [92, 231]. For the second and fourth subframes, a pitch resolution of 1/2 is 

always used in the range [ri-8, Ti+7 — ], where Tj is nearest integer to the fractional pitch lag of the previous (1st or 
3rd) subframe. 

In 6.60 kbit/s mode, in the first subframe, a fractional pitch delay is used with resolutions 1/2 in the range [34,91 — ], 
and integers only in the range [92, 231]. For the second, third and fourth subframes, a pitch resolution of 1/2 is always 
used in the range [ri-8, Ti+7 — ], where Tj is nearest integer to the fractional pitch lag of the first subframe. 

Closed-loop pitch analysis is performed around the open-loop pitch estimates on a subframe basis. In 8.85, 12.65, 
14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s modes, in the first (and third) subframe the range T„p±7, bounded by 
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34. ..231, is searched. In 6.60 kbit/s mode, in the first subframe the range Tgp+l, bounded by 34. ..231, is searched. For 
all the modes, for the other subframes, closed-loop pitch analysis is performed around the integer pitch selected in the 
previous subframe, as described above. In 12.65, 14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s modes, the pitch 
delay is encoded with 9 bits in the first and third subframes and the relative delay of the other subframes is encoded 
with 6 bits. In 8.85 kbit/s mode, the pitch delay is encoded with 8 bits in the first and third subframes and the relative 
delay of the other subframes is encoded with 5 bits. In 6.60 kbit/s mode, the pitch delay is encoded with 8 bits in the 
first subframe and the relative delay of the other subframes is encoded with 5 bits. 

The closed loop pitch search is performed by minimizing the mean-square weighted error between the original and 
synthesized speech. This is achieved by maximizing the term 

■r-i63 

T,= ^"= (37) 

where x(n) is the target signal and yj^in) is the past filtered excitation at delay k (past excitation convolved with h(nj). 
Note that the search range is limited around the open-loop pitch as explained earlier. 

The convolution jkin) is computed for the first delay in the searched range, and for the other delays, it is updated using 
the recursive relation 

3'i(«)=}'n(«-l) + «(-^)Kn) (38) 

where u(n),n=-(23lH-17),...,63, is the excitation buffer. Note that in search stage, the samples u(n),n = 0,...,63 , are 

not known, and they are needed for pitch delays less than 64. To simplify the search, the LP residual is copied to u(n) in 
order to make the relation in Equation (38) valid for all delays. 

Once the optimum integer pitch delay is determined, the fractions from ^ to -| with a step of ^ around that integer 

are tested. The fractional pitch search is performed by interpolating the normalized correlation in Equation (37) and 
searching for its maximum. Once the fractional pitch lag is determined, v'(n) is computed by interpolating the past 
excitation signal u(n) at the given phase (fraction). (The interpolation is performed using two FIR filters (Hamming 
windowed sine functions); one for interpolating the term in Equation (34) with the sine truncated at ±17 and the other 
for interpolating the past excitation with the sine truncated at ±63). The filters have their cut-off frequency (-3 dB) at 
6000 Hz in the oversampled domain, which means that the interpolation filters exhibit low-pass frequency response 
Thus, even when the pitch delay is an integer value, the adaptive codebook excitation consists of a low-pass filtered 
version of the past excitation at the given delay and not a direct copy thereof. Further, for delays smaller than the 
subframe size, the adaptive codebook excitation is completed based on the low-pass filtered interpolated past excitation 
and not by repeating the past excitation. 

In order to enhance the pitch prediction performance in wideband signals, a frequency-dependant pitch predictor is 
used. This is important in wideband signals since the periodicity doesn"t necessarily extend over the whole spectrum. In 
this algorithm, there are two signal paths associated to respective sets of pitch codebook parameters, wherein each 
signal path comprises a pitch prediction error calculating device for calculating a pitch prediction error of a pitch 
codevector from a pitch codebook search device. One of these two paths comprises a low-pass filter for filtering the 
pitch codevector and the pitch prediction error is calculated for these two signal paths. The signal path having the 
lowest calculated pitch prediction error is selected, along with the associated pitch gain. 

The low pass filter used in the second path is in the formBip(z)=0.18zH-0.64H-0.18z''. Note that 1 bit is used to encode 
the chosen path. 

Thus, for 12.65, 14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s modes, there are two possibilities to generate the 

1 
adaptive codebook v(n), v(n) = v'(n) in the first path, or v(n) — 2.^lp (' "*" 1)^ ('^ "*" ^) ^^ ^^^ second path, where 

bLP=[0. 18,0.64,0. 18]. The path which results in minimum energy of the target signal X2(n) defined in Equation (40) is 
selected for the filtered adaptive codebook vector. For 6.60 and 8.85 kbit/s modes, v{n) is always 



{n) = Y,^^p{i + \)v\n + i). 
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The adaptive codebook gain is then found by 

-i63 






bounded by < ^ < 1 .2, 



(39) 



where y(n) = v(n)*h(n) is the fihered adaptive codebook vector (zero-state response of H(z)W(z) to v,(m)). To 
insure stabiHty, the adaptive codebook gain gp is bounded by 0.95, if the adaptive codebook gains of the previous 
subframes have been small and the LP filters of the previous subframes have been close to being unstable. 

5.8 Algebraic codebook 
5.8.1 Codebook structure 

The codebook structure is based on interleaved single-pulse permutation (ISPP) design. The 64 positions in the 
codevector are divided into 4 tracks of interleaved positions, with 16 positions in each track. The different codebooks at 
the different rates are constructed by placing a certain number of signed pulses in the tracks (from 1 to 6 pulses per 
track). The codebook index, or codeword, represents the pulse positions and signs in each track. Thus, no codebook 
storage is needed, since the excitation vector at the decoder can be constructed through the information contained in the 
index itself (no lookup tables). 

An important feature of the used codebook is that it is a dynamic codebook consisting of an algebraic codebook 
followed by an adaptive prefilter F{z) which enhances special spectral components in order to improve the synthesis 
speech quality. A prefilter relevant to wideband signals is used whereby F{z) consists of two parts: a periodicity 
enhancement part l/(l-0.85z"^) and a tilt part (1 -y^i z"'), where Tis the integer part of the pitch lag andy^i is related to 
the voicing of the previous subframe and is bounded by [0.0,0.5]. The codebook search is performed in the algebraic 
domain by combining the filter F{z) with the weighed synthesis filter prior to the coddedbook search. Thus, the impulse 
response h{n) must be modified to include the prefilter F{z)- That is, h{n) <— h{n) * f{n) 

The codebook structures of different bit rates are given below. 



5.8.1.1 



23.85 and 23.05 kbit/s mode 



In this codebook, the innovation vector contains 24 non-zero pulses. All pulses can have the amplitudes +\ or -1. The 
64 positions in a subframe are divided into 4 tracks, where each track contains six pulses, as shown in Table 4. 

Table 4. Potential positions of individual pulses in the algebraic codebook, 23.85 and 23.05 kbit/s 



Track 


Pulse 


Positions 


1 


iO, 14, is, il2, il6, 120 


0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 


2 


il,i5, ig, il3, il7, 121 


1 , 5, 9, 1 3, 1 7, 21 , 25, 29, 33, 37, 41 , 45, 49, 53, 57, 61 


3 


12, 16, Ho, il4, il8, 122 


2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 


4 


is, 17, il1, il5, il9, 123 


3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63 



The six pulses in one track are encoded with 22 bits. 

This gives a total of 88 bits (22h-22h-22h-22) for the algebraic code. 



5.8.1.2 



19.85 kbit/s mode 



In this codebook, the innovation vector contains 18 non-zero pulses. All pulses can have the amplitudes +\ or -1. The 
64 positions in a subframe are divided into 4 tracks, where each of the first two tracks contains five pulses and each of 
the other tracks contains four pulses, as shown in Table 5. 
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Table 5. Potential positions of individual pulses in the algebraic codebook, 19.85 kbit/s 



Track 


Pulse 


Positions 


1 


iO, U, is, il2, il6 


0, 4, S, 12, 16, 20, 24, 2S, S2 S6, 40, 44, 4S, 52, 56, 60 


2 


il,i5, ig, il3, il7 


1 , 5, 9, 1 3, 1 7, 21 , 25, 29, SS, 37, 41 , 45, 49, 53, 57, 61 


3 


12, 16, Ho, il4 


2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 


4 


is, iy, il1,il5 


3,7,11,15,19, 23, 27, 31 , 35, 39, 43, 47, 51 , 55, 59, 63 



The five pulses in one track are encoded with 20 bits. The four pulses in one track is encoded with 16 bits. 
This gives a total of 72 bits (20+20+16+16) for the algebraic code. 



5.8.1.3 



18.25 kbit/s mode 



In this codebook, the innovation vector contains 16 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 
64 positions in a subframe are divided into 4 tracks, where each track contains four pulses, as shown in Table 6. 

Table 6. Potential positions of individual pulses in the algebraic codebook, 18.25 kbit/s 



Track 


Pulse 


Positions 


1 


iO, i4, i8, il2 


0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 


2 


il, i5, i9, il3 


1 , 5, 9, 1 3, 1 7, 21 , 25, 29, 33, 37, 41 , 45, 49, 53, 57, 61 


3 


i2, i6, ilO, il4 


2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 


4 


i3, i7, il1,il5 


3,7,11,15,19, 23, 27, 31 , 35, 39, 43, 47, 51 , 55, 59, 63 



The four pulses in one track are encoded with 16 bits. 

This gives a total of 64 bits (16+16+16+16) for the algebraic code. 



5.8.1.4 



1 5.85 kbit/s mode 



In this codebook, the innovation vector contains 12 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 
64 positions in a subframe are divided into 4 tracks, where each track contains three pulses, as shown in Table 7. 

Table 7. Potential positions of individual pulses in the algebraic codebook, 15.85 kbit/s 



Track 


Pulse 


Positions 


1 


iO, i4, i8 


0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 


2 


il,i5, i9 


1 , 5, 9, 1 3, 1 7, 21 , 25, 29, 33, 37, 41 , 45, 49, 53, 57, 61 


3 


i2, i6, ilO 


2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 


4 


i3, i7, ill 


3, 7, 1 1 , 1 5, 1 9, 23, 27, 31 , 35, 39, 43, 47, 51 , 55, 59, 63 



The three pulses in one track are encoded with 13 bits. 

This gives a total of 52 bits (13+13+13+13) for the algebraic code. 



5.8.1.5 



1 4.25 kbit/s mode 



In this codebook, the innovation vector contains 10 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 
64 positions in a subframe are divided into 4 tracks, where each track contains two or three pulses, as shown in Table 1 

Table 8. Potential positions of individual pulses in the algebraic codebook, 14.25 kbit/s 



Track 


Pulse 


Positions 


1 


iO, i4, i8 


0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 


2 


il,i5, i9 


1 , 5, 9, 1 3, 1 7, 21 , 25, 29, 33, 37, 41 , 45, 49, 53, 57, 61 


3 


i2, i6 


2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 


4 


i3, i7 


3, 7, 1 1 , 1 5, 1 9, 23, 27, 31 , 35, 39, 43, 47, 51 , 55, 59, 63 
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Each two pulse positions in one track are encoded with 8 bits (4 bits for the position of every pulse), and the sign of the 
first pulse in the track is encoded with 1 bit. 

The three pulse in one track are encoded with 13 bits. 

This gives a total of 44 bits (13+13+9+9) for the algebraic code. 



5.8.1.6 



1 2.65 kbit/s mode 



In this codebook, the innovation vector contains 8 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 64 
positions in a subframe are divided into 4 tracks, where each track contains two pulses, as shown in Table 9. 

Table 9. Potential positions of individual pulses in the algebraic codebook, 12.65 kbit/s 



Track 


Pulse 


Positions 


1 


iO, 14 


0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 


2 


il, 15 


1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61 


3 


12, 16 


2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 


4 


13, 17 


3, 7, 1 1 , 1 5, 1 9, 23, 27, 31 , 35, 39, 43, 47, 51 , 55, 59, 63 



Each two pulse positions in one track are encoded with 8 bits (total of 32 bits, 4 bits for the position of every pulse), and 
the sign of the first pulse in the track is encoded with 1 bit (total of 4 bits). This gives a total of 36 bits for the algebraic 
code. 



5.8.1.7 



8.85 kbit/s mode 



In this codebook, the innovation vector contains 4 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 64 
positions in a subframe are divided into 4 tracks, where each track contains one pulse, as shown in Table 10. 

Table 10. Potential positions of individual pulses in the algebraic codebook, 8.85 kbit/s 



Track 


Pulse 


Positions 


1 


iQ 


0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 


2 


il 


1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61 


3 


12 


2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 


4 


13 


3, 7, 1 1 , 1 5, 1 9, 23, 27, 31 , 35, 39, 43, 47, 51 , 55, 59, 63 



Each pulse position in one track are encoded with 4 bits and the sign of the pulse in the track is encoded with 1 bit. This 
gives a total of 20 bits for the algebraic code. 



5.8.1.8 



6.60 kbit/s mode 



In this codebook, the innovation vector contains 2 non-zero pulses. All pulses can have the amplitudes +1 or -1. The 64 
positions in a subframe are divided into 2 tracks, where each track contains one pulse, as shown in Table 1 1 . 

Table 11. Potential positions of individual pulses in the algebraic codebook, 6.60 kbit/s 



Track 


Pulse 


Positions 


1 


io 


0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 
34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62 


2 


il 


1,3,5, 7,9, 11, 13, 15, 17, 19,21,23,25, 27,29,31,33, 
35, 37, 39, 41 , 43, 45, 47, 49, 51 , 53, 55, 57, 59, 61 , 63 



Each pulse position in one track are encoded with 5 bits and the sign of the pulse in the track is encoded with 1 bit. This 
gives a total of 12 bits for the algebraic code. 
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5.8.2 Pulse indexing 



In the above section, the number of bits needed to encode a number of pulses in a track was given. In this section, the 
procedures used for encoding from 1 to 6 pulses per track will be described. The description will be given for the case 
of 4 tracks per subframe, with 16 positions per track and pulse spacing of 4 (which is the case for all modes except the 
6.6 kbit/s mode). 

Encoding 1 signed pulse per track 

The pulse position index is encoded with 4 bits and the sign index with 1 bit. The position index is given by the pulse 
position in the subframe divided by the pulse spacing (integer division). The division remainder gives the track index. 
For example, a pulse at position 31 has a position index of 31/4 = 7 and it belong to the track with index 3 (4* track). 

The sign index here is set to for positive signs and 1 for negative signs. 

The index of the signed pulse is given by 

Iip=/7+.9x2^ 

where p is the position index, s is the sign index, and M=4 is the number of bits per track. 

Encoding 2 signed pulses per track 

In case of two pulses per track of K=2'^ potential positions (here M=4), each pulse needs 1 bit for the sign and M bits for 
the position, which gives a total of 2M+2 bits. However, some redundancy exists due to the unimportance of the pulse 
ordering. For example, placing the first pulse at position p and the second pulse at position q is equivalent to placing the 
first pulse at position q and the second pulse at position p. One bit can be saved by encoding only one sign and deducing 
the second sign from the ordering of the positions in the index. Here the index is given by 

l2p = Pi + PoX2^+sx2'^ 

where s is the sign index of the pulse at position index po. If the two signs are equal then the smaller position is set to po 
and the larger position is set to pi. On the other hand, of the two signs are not equal then the larger position is set to po 
and the smaller position is set to p|. At the decoder, the sign of the pulse at position po is readily available. The second 
sign is deduced from the pulse ordering. If po is larger than p, then the sign of the pulse at position pi is opposite to that 
at position po. If this is not the case then the two signs are set equal 

Encoding 3 signed pulses per track 

In case of three pulses per track, similar logic can be used as in the case of two pulses. For a track with 2'^ positions, 
3M+1 bits are needed instead of 3M+3 bits. A simple way of indexing the pulses is to divide the track positions in two 
sections (or halves) and identify a section that contains at least two pulses. The number of positions in the section is KI2 
= 2*^/2 = 2*^', which can be represented with M-1 bits. The two pulses in the section containing at least two pulses are 
encoded with the procedure for encoding 2 signed pulses which requires 2(M-1)+1 bits and the remaining pulse which 
can be anywhere in the track (in either section) is encoded with the M+1 bits. Finally, the index of the section that 
contains the two pulses is encoded with 1 bit. Thus the total number of required bits is 2(M-1)+1 + M+1 + 1 = 3M+1. 

A simple way of checking if two pulses are positioned in the same section is done by checking whether the most 
significant bits (MSB) of their position indices are equal or not. Note that a MSB of means that the position belongs to 
the lower half of the track (0-7) and MSB of 1 means it belongs to the upper half (8-15). If the two pulses belong to the 
upper half, they need to be shifted to the range (0-7) before encoding them using 2x3+1 bits. This can be done by 
masking the M-1 least significant bits (LSB) with a mask consisting of M-1 ones (which corresponds to the number 7 in 
this case). 

The index of the 3 signed pulses is given by 

l3p = l2p+kx2^^-'+I,pX2^^ 

where Ijp is the index of the two pulses in the same section, k is the section index (0 or 1), and lip is the index of the 
third pulse in the track. 

Encoding 4 signed pulses per track 

The 4 signed pulses in a track of length K=T}* can be encoded using 4M bits. Similar to the case of 3 pulses, the K 
positions in the track are divided into 2 sections (two halves) where each section contains KI2=% positions. Here we 
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denote the sections as Section A with positions to KI2-1 and Section B with positions KI2 io K-l. Each section can 
contain from to 4 pulses. The table below shows the 5 cases representing the possible number of pulses in each 
section: 



case 


Pulses in Section A 


Pulses in Section B 


Bits needed 








4 


4M-3 


1 


1 


3 


4M-2 


2 


2 


2 


4M-2 


3 


3 


1 


4M-2 


4 


4 





4M-3 



In cases or 4, the 4 pulses in a section of length K/2=2 ' can be encoded using 4(M-l)+l=4M-3 bits (this will be 
explained later on). 

In cases 1 or 3, the 1 pulse in a section of length K/2=2'^'^ can be encoded with M-1+1 = M bits and the 3 pulses in the 
other section can be encoded with 3(M-1)+1 = 3M-2 bits. This gives a total of M+3M-2 = 4M-2 bits. 

In case 2, the pulses in a section of length KJ2=2}*'^ can be encoded with 2(M-1)+1 = 2M-1 bits. Thus for both sections, 
2(2M-1) = 4M-2 bits are required. 

Now the case index can be encoded with 2 bits (4 possible cases) assuming cases and 4 are combined. Then for cases 
1, 2, or 3, the number of needed bits is 4M-2. This gives a total of 4M-2 + 2 = 4M bits. For cases or 4, one bit is 
needed for identifying either case, and 4M-3 bits are needed for encoding the 4 pulses in the section. Adding the 2 bits 
needed for the general case, this gives a total of l+4M-3+2= 4Mbits. 

The index of the 4 signed pulses is given by 

where k is the case index (2 bits), and Iab is the index of the pulses in both sections for each individual case. 
For cases and 1, Iab is given by 



I. 



4p_section 



+ jx2^ 



where j is a 1 -bit index identifying the section with 4 pulses and l4p 
(which requires 4M-3 bits). 

For case 1, Iab is given by 



I is the index of the 4 pulses in that section 



\^ 



— IspJ + Ilp_A 



x2 



3(M-1)+1 



where l3p b is the index of the 3 pulses in Section B (3(M-1)+1 bits) and lip a is the index of the pulse in Section A ((M- 
1)+1 bits). 

For case 2, Iab is given by 

iAB_2 - i2p_B + i2p_A ^^ 

where l2p b is the index of the 2 pulses in Section B (2(M-1)+1 bits) and l2p a is the index of the two pulses in Section A 
(2(M-1)+1 bits). 

Finally, for case 3, Iab is given by 

M 



Ilp_B + I 



3p_A 



x2" 



where I|p b is the index of the pulse in Section B ((M-l)+l bits) and Isp a is the index of the 3 pulses in Section A (3(M- 
1)+1 bits). 

For cases and 4, it was mentioned that the 4 pulses in one section are encoded using 4(M-1)+1 bits. This is done by 
further dividing the section into 2 subsections of length /r/4=2*'"^ (=4 in this case); identifying a subsection that contains 
at least 2 pulses; coding the 2 pulses in that subsection using 2(M-2)+l=2M-3 bits; coding the index of the subsection 
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that contains at least 2 pulses using 1 bit; and coding the remaining 2 pulses, assuming that they can be anywhere in the 
section, using 2(M-1)+1=2M-1 bits. This gives a total of (2M-3)+(l)+(2M-l) = 4M-3 bits 

Encoding 5 signed pulses per track 

The 5 signed pulses in a track of length K=T}* can be encoded using 5M bits. Similar to the case of 4 pulses, the K 
positions in the track are divided into 2 sections A and B. Each section can contain from to 5 pulses. A simple 
approach to encode the 5 pulses is to identify a section that contains at least 3 pulses and to encode the 3 pulses in that 
section using 3(M-1)+1= 3M-2 bits, and to encode the remaining 2 pulses in the whole track using 2M+1 bits. This 
gives 5M-1 bits. An extra bit is needed to identify the section that contains at least 3 pulses. Thus a total of 5M bits are 
needed to encode the 5 signed pulses. 



The index of the 5 signed pulses is given by 

5p = l2p + \^ 



L„ = I,„ + K„x2™+kx2^^-' 



Where k is the index of the section that contains at least 3 pulses, l3p is the index of the 3 pulses in that section (3(M- 
1)+1 bits), and l2p is the index of the remaining 2 pulses in the track (2M+1 bits). 

Encoding 6 signed pulses per track 

The 6 signed pulses in a track of length K=2'^ are encoded using 6M-2 bits. Similar to the case of 5 pulses, the K 
positions in the track are divided into 2 sections A and B. Each section can contain from to 6 pulses. The table below 
shows the 7 cases representing the possible number of pulses in each sections: 



case 


Pulses in Section A 


Pulses in Section B 


Bits needed 








6 


6M-5 


1 


1 


5 


6M-5 


2 


2 


4 


6M-5 


3 


3 


3 


6M-4- 


4 


4 


2 


6M-5 


5 


5 


1 


6M-5 


6 


6 





6M-5 



Note that cases and 6 are similar except that the 6 pulses are in different section. Similarly, cases 1 and 5 as well as 
cases 2 and 4 differ only in the section that contains more pulses. Therefore these cases can be coupled and an extra bit 
can be assigned to identify the section that contains more pulses. Since these cases initially need 6M-5 bits, the coupled 
cases need 6M-4 bits taking into account the Section bit. Thus, we have now 4 states of coupled cases, that is (0,6), 
(1,5), (2,4), and (3),with 2 extra bits needed for the state. This gives a total of 6M-4+2=6M-2 bits for the 6 signed 
pulses. 

In cases and 6, 1 bit is needed to identify the section which contains 6 pulses. 5 pulses in that section are encoded 
using 5(M-1) bits (since the pulses are confined to that section), and the remaining pulse is encoded using (M-l)+l bits. 
Thus a total of l+5(M-l)+M=6M-4 bits are needed for this coupled case. Extra 2 bits are needed to encode the state of 
the coupled case, giving a total of 6M-2 bits. For this coupled case, the index of the 6 pulses is given by 

Ifip = lip + l5pX2^+ jx2™-' + kx2™-' 

where k is the index of the coupled case (2 bits), j is the index of the section containing 6 pulses (1 bit), Isp is the index 
of 5 pulses in that section (5(M-1) bits), and I|p is the index of the remaining pulse in that section ((M-l)+l bits). 

In cases 1 and 5, 1 bit is needed to identify the section which contains 5 pulses. The 5 pulses in that section are encoded 
using 5(M-1) bits and the pulse in the other section is encoded using (M-l)+l bits. For this coupled case, the index of 
the 6 pulses is given by 

l6p = lip + l5pX2^+ jx2™-' + kx2™-' 

where k is the index of the coupled case (2 bits), j is the index of the section containing 5 pulses (1 bit), Isp is the index 
of the 5 pulses in that section (5(M-1) bits), and lip is the index of the pulse in the other section ((M-l)+l bits). 
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In cases 2 or 4, 1 bit is needed to identify the section which contains 4 pulses. The 4 pulses in that section are encoded 
using 4(M-1) bits and the 2 pulses in the other section are encoded using 2(M-1)+1 bits. For this coupled case, the index 
of the 6 pulses is given by 

l6p = l2p + l4pX2^<^-"^' + jx2™-' + 1^x2"^-^ 

where k is the index of the coupled case (2 bits), j is the index of the section containing 4 pulses (1 bit), l4p is the index 
of 4 pulses in that section (4(M-1) bits), and l2p is the index of the 2 pulses in the other section (2(M-1)+1 bits). 

In case 3, the 3 pulses in each section are encoded using 3(M-1)+1 bits in each Section. For this case, the index of the 6 
pulses is given by 

lep = IspB + l3pAx2 + kx2 

where k is the index of the coupled case (2 bits), l3pB is the index of 3 pulses Section B (3(M-1)+1 bits), and l^pA is the 
index of the 3 pulses in Section A (3(M-1)+1 bits). 

5.8.3 Codebook search 

The algebraic codebook is searched by minimizing the mean square error between the weighted input speech and the 
weighted synthesis speech. The target signal used in the closed-loop pitch search is updated by subtracting the adaptive 
codebook contribution. That is 

X2(n) = x(n)-gpy(n), n = 0,...,63, (40) 

where y(n) = v(n)* h(n) is the filtered adaptive codebook vector and gp is the unquantized adaptive codebook gain. 

The matrix H is defined as the lower triangular Toeplitz convolution matrix with diagonal h(Q) and lower diagonals 
h(l),. ..,h(63), and d = H'x2 is the correlation between the target signal X2(n) and the impulse response h{n) (also 
known as the backward filtered target vector), and O = H'H is the matrix of correlations ofh(n). 

The elements of the vector d are computed by 

63 

d(n) = 'Y^X2(i)h(i-n), n = 0,...63, (41) 

i-n 

and the elements of the symmetric matrix O are computed by 

63 

(P(i,j) = ^h(n-i)h(n-j), i=0,...,63, j = i,...,63. (42) 



If Ck is the algebraic codevector at index k, then the algebraic codebook is searched by maximizing the 
search criterion 

Q _ (x',Hc,)^ _ (d'c,)^ _(j;,)^ ^43^ 

The vector d and the matrix $ are usually computed prior to the codebook search. 

The algebraic structure of the codebooks allows for very fast search procedures since the innovation vector c* contains 
only a few nonzero pulses. The correlation in the numerator of Equation (43) is given by 

N„-\ 



C= Xa.'^K) (44) 



1=0 



where m, is the position of the /th pulse, a, is its amplitude, and A'^, is the number of pulses. The energy in the 
denominator of Equation (43) is given by 
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E= >^^(m;,m,) + 2 >^ 2_^ajaj(p(m^,mj) (45) 

!=0 1=0 j=i+l 

To simplify the search procedure, the pulse amplitudes are predetermined based on a certain reference signal b(n). In 
this so-called signal-selected pulse amplitude approach, the sign of a pulse at position / is set equal to the sign of the 
reference signal at that position. Here, the reference signal b(n) is given by 

[e~ 

b(n)= \—^r^^p{n) + od{n) (46) 

where E^ = d'd is the energy of the signal d{n) and E^ = r^^pr^j^p is the energy of the signal r^_^_^ («) which is the residual 

signal after long term prediction. The scaling factor a controls the amount of dependence of the reference signal on 
d{n), and it is lowered as the bit rate is increased. Here a=2 for 6.6 and 8.85 modes; a=\ for 12.65, 14.25, and 15.85 
modes; a=0.8 for 18.25 mode; a=0.75 for 19.85 mode; and a=0.5 for 23.05 and 23.85 modes. 

To simplify the search the signal d{n) and matrix <I> are modified to incorporate the pre-selected signs. Let stin) denote 
the vector containing the signs of b(n). The modified signal d"(n) is given by 

d'(n) = s^(n)d(n) n=0,...,N-l 

and the modified autocorrelation matrix O" is given by 

/(i,j)^s,(i)s,ij)^(i,j), i=0,...J^-\; i=i,...,N-\. 

The correlation at the numerator of the search criterion Q/, is now given by 



R=Y,d\m,) 



i=0 

and the energy at the denominator of the search criterion Q/, is given by 

E= 2^^'('w,5m,) + 2 2^ 2^ ^'('W;,m) 

1=0 1=0 ;=i+l 

The goal of the search now is to determine the codevector with the best set of Np pulse positions assuming amplitudes of 
the pulses have been selected as described above. The basic selection criterion is the maximization of the above 
mentioned ratio Q^. 

In order to reduce the search complexity, a fast search procedure known as depth-first tree search procedure is used, 
whereby the pulse positions are determined N„ pulses at a time. More precisely, the Np available pulses are partitioned 
into M non-empty subsets of N„ pulses respectively such that Ni+N2---+N„...+Nm = Np. A particular choice of positions 
for the first J = Ni+N2---+N„.i pulses considered is called a level-m path or a path of length J. The basic criterion for a 
path of J pulse positions is the ratio QkiJ) when only the J relevant pulses are considered. 

The search begins with subset #1 and proceeds with subsequent subsets according to a tree structure whereby subset m 
is searched at the m* level of the tree. The purpose of the search at level 1 is to consider the A^i pulses of subset #1 and 
their valid positions in order to determine one, or a number of, candidate path(s) of length A^i which are the tree nodes at 
level 1. The path at each terminating node of level m-l is extended to length Ni+N2---+Nm at level m by considering N^ 
new pulses and their valid positions. One, or a number of, candidate extended path(s) are determined to constitute 
level-m nodes. The best codevector corresponds to that path of length Np which maximizes the criterion QtiNp) with 
respect to all level-M nodes. 

A special form of the depth-first tree search procedure is used here, in which two pulses are searched at a time, that is, 
N„r=2, and these 2 pulses belong to two consecutive tracks. Further, instead of assuming that the matrix 4> is 
precomputed and stored, which requires a memory of A^xA^ words (64x64= 4k words), a memory-efficient approach is 
used which reduces the memory requirement. In this approach, the search procedure is performed in such a way that 
only a part of the needed elements of the correlation matrix are precomputed and stored. This part corresponds to the 
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correlations of the impulse response corresponding to potential pulse positions in consecutive tracks, as well as the 
correlations corresponding to ^J),j=Q,...,N-l (that is the elements of the main diagonal of matrix <I>). 

In order to reduce complexity, while testing possible combinations of two pulses, a limited number of potential 
positions of the first pulse are tested. Further, in case of large number of pulses, some pulses in the higher levels of the 
search tree are fixed. In order to guess intelligently which potential pulse positions are considered for the first pulse or 
in order to fix some pulse positions, a "pulse-position likelihood-estimate vector" b is used, which is based on speech- 
related signals. The /5* component b(p) of this estimate vector b characterizes the probability of a pulse occupying 
position/? (p = 0, 1, ... A^-1) in the best codevector we are searching for. Here the estimate vector b is the same vector 
used for preselecting the amplitudes and given in Equation (46). 

The search procedures for all bit rate modes are similar. Two pulses are searched at a time, and these two pulses always 
correspond to consecutive tracks. That is the two searched pulses are in tracks To-Ti, T1-T2, T2-T3, or T3-T0. 

Before searching the positions, the sign of at pulse a potential position n is set the sign of b(n) at that position. Then the 
modified signal d"(ti) is computed as described above by including the predetermined signs. 

For the first 2 pulses (1^' tree level), the correlation at the numerator of the search criterion is given by 

R = d'(mQ) + d'(mi) 

and the energy at the denominator of the search criterion Qt is given by 

E = (p '(mQ,mQ) + (p '(m[,m[) + 2^'(mQ,m[) 

where the correlations <p ' (m, , m ) has been modified to include the preselected signs at positions m, and m,. 

For subsequent levels, the numerator and denominator are updated by adding the contribution of two new pulses. 
Assuming that two new pulses at a certain tree level with positions nit and mt+\ from two consecutive tracks are 
searched, then the updated value of R is given by 

R = R + d'(mi^) + d'(m^^i) (47) 

and the updated energy is given by 

£ = £' + (Z>'(mj,m^)-l-(Z>'('w^+i,'M*+i) + 2^'(»it,»i*+i) + 2/?,,,(»Jt) + 2/?M,(»J*+i) (48) 

where Rhiim) is the correlation between the impulse response h{n) and a vector v;,(n) containing the addition of delayed 
versions of impulse response at the previously determined positions. That is, 

k-\ 



Xn) = Y,h(n-m^) 



and 



JV— i 

n=m 

At each tree level, the values of Rhv(m) are computed online for all possible positions in each of the two tracks being 
tested. It can be seen from Equation (48) that only the correlations ^ ' (m^ , W^^j ) corresponding to pulse positions in 

two consecutive tracks need to be stored (4x16x16 words), along with the correlations (p ' (m^ , m^ ) corresponding to 

the diagonal of the matrix «1) (64 words). Thus the memory requirement in the present algebraic structure is 1088 words 
instead of 64x64=4096 words. 

The search procedures at the different bit rates modes are similar. The difference is in the number of pulses, and 
accordingly, the number of levels in the tree search. In order to keep a comparable search complexity across the 
different codebooks, the number of tested positions is kept similar. 

The search in the 12.65 kbit/s mode will be described as an example. In this mode, 2 pulses are placed in each track 
giving a total of 8 pulses per subframe of length 64. Two pulses are searched at a time, and these two pulses always 
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correspond to consecutive tracks. That is the two searched pulses are in tracks To-Ti, T1-T2, T2-T3, or T3-T0. The tree 
has 4 levels in this case. At the first level, pulse Pq is assigned to track To and pulse Pi to track Ti. In this level, no 
search is performed and the two pulse positions are set to the maximum of b(n) in each track. In the second level, pulse 
P2 is assigned to track T2 and pulse P3 to track T3. 4 positions for pulse P2 are tested against all 16 positions of pulse P3. 
The 4 tested positions of P2 are determined based on the maxima of b(n) in the track. In the third level, pulse P4 is 
assigned to track T, and pulse P5 to track T2. 8 positions for pulse P4 are tested against all 16 positions of pulse Pj. 
Similar to the previous search level, the 8 tested positions of P4 are determined based on the maxima of b(n) in the 
track. In the fourth level, pulse P(, is assigned to track T3 and pulse P7 to track To. 8 positions for pulse P(, are tested 
against all 16 positions of pulse P7. Thus the total number of tested combination is 4x16+8x16+8x16=320. The whole 
process is repeated 4 times (4 iterations) by assigning the pulses to different tracks. For example, in the 2"'^ iteration, 
pulses Po to P7 are assigned to tracks Tj, T2, T3, To, T2, T3_ To, and Ti, respectively. Thus the total number of tested 
position combinations is 4x320=1280. 

As another search example, in the 15.85 kbit/s mode, 3 pulses are placed in each track giving a total of 12 pulses. There 
are 6 levels in the tree search whereby two pulses are searched in each level. In the first two levels, 4 pulses are set to 
the maxima of b(n). In the subsequent 4 levels, the number of tested combinations are 4x16, 6x16, 8x16, and 8x16, 
respectively. 4 iterations are used giving a total of 4x26x16=1664 combinations. 

5.9 Quantization of the adaptive and fixed codebook gains 

The adaptive codebook gain (pitch gain) and the fixed (algebraic) codebook gain are vector quantized using a 6-bit 
codebook for modes 8.85 and 6.60 kbit/s and using a 7-bit codebook for all the other modes. 

The fixed codebook gain quantization is performed using MA prediction with fixed coefficients. The 4th order MA 
prediction is performed on the innovation energy as follows. Let E(n) be the mean-removed innovation energy (in dB) 
at subframe n, and given by 



£(n) = 101og 



,=0 J 



E (49) 



where A^=64 is the subframe size, c(i) is the fixed codebook excitation, and E = 30 dB is the mean of the innovation 
energy. The predicted energy is given by 



E(n) = '^biR(n-i) (50) 



where [bi fe2^3^4]=[05.,0.4,0.3,0.2] are the MA prediction coefficients, and R{k) is the quantized energy prediction 
error at subframe k. The predicted energy is used to compute a predicted fixed-codebook gain ^V as in Equation (49) (by 
substituting E(n) by E{n) and g^ by g'^). This is done as follows. First, the mean innovation energy is found by 



f 1 'v-i 



1 jv — 1 



vA^ ,.0 



(51) 



£, = 10 log 
and then the predicted gain g'c is found by 

^■^^^QOmCE(n).E-E,)_ (52) 

A correction factor between the gain g^ and the estimated one gV is given by 

r=8c/8c- (53) 

Note that the prediction error is given by 

R(n) = E{n) - E{n) = 20 log (y). ( 54) 
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The pitch gain, gp, and correction factor y are jointly vector quantized using a 6-bit codebook for modes 8.85 and 6.60 
kbit/s, and 7-bit codebook for other modes. The gain codebook search is performed by minimizing the mean-square of 
the weighted error between original and reconstructed speech which is given 

E = x'x + gly'y + g^^z'z- Ig^x'y -2g^x'z + Ig^g^z, ( 55) 

where the x is the target vector, y is the filtered adaptive codebook vector, and z is the filtered fixed codebook vector. 
(Each gain vector in the codebook also has an element representing the quantized energy prediction error.) The 

quantized energy prediction error associated with the chosen gains is used to update R{n) . In the search, only the 64 

codevectors that are closest to the unquantized pitch gain, gp, are taken into account. 



5.10 Memory update 



An update of the states of the synthesis and weighting filters is needed in order to compute the target signal in the next 
subframe. 

After the two gains have been quantized, the excitation signal, u{n), in the present subframe is found by 

u{n) = gpVin) + g^cin), n = 0,...,63, (56) 

where g and g^ are the quantized adaptive and fixed codebook gains, respectively, v,(n) the adaptive codebook vector 

(interpolated past excitation), and c{n) is the fixed codebook vector (algebraic code including pitch sharpening). The 
states of the filters can be updated by filtering the signal r(n)-u(n) (difference between residual and excitation) through 
the filters 1/A(z) and Mzl y])H ^^_ p^{z) for the 64 sample subframe and saving the states of the filters. This would 
require 3 filterings. A simpler approach which requires only one filtering is as follows. The local synthesis speech, 
s{n) , is computed by filtering the excitation signal through 1 / A{z) ■ The output of the filter due to the input r{n)-u{n) 
is equivalent to e(n) = s{n) - s(n) . So the states of the synthesis filter 1 / A{z) are given by e(n),n=48, . . .,63. Updating 
the states of the filter Mzl Y\)H ^g_g^pf^(z) can be done by filtering the error signal e{ri) through this filter to find the 
perceptually weighted error e„{n)- However, the signal ej^n) can be equivalently found by 

e,M) = x(n)-gpy{n)-g^z{n). (57) 

Since the signals x(n), y{n), and z{n) are available, the states of the weighting filter are updated by computing e^^(n) as in 
Equation (54) for n = 48,..., 63 . This saves two filterings. 



5.1 1 High-band gain generation 



In order to compute the high band gain for 23.85 kbit/s mode, 16 kHz input speech is filtered through a band-pass FIR 
filter Hhb(z) which has the passband from 6.4 to 7 kHz. The high band gain g^B is obtained by 



63 



?HB=^ , (58) 

'^{sHBliOf 



(=0 



where Sfjgii) is band-pass filtered input speech and .S//b2(0 is high-band speech synthesis obtained from high-band 
excitation UnBiii) filtered through high-band synthesis filter A//b(z) described in Section 6.3.2.2. 



6 Functional description of the decoder 

The function of the decoder consists of decoding the transmitted parameters (LP parameters, adaptive codebook vector, 
adaptive codebook gain, fixed codebook vector, fixed codebook gain and high-band gain) and performing synthesis to 
obtain the reconstructed speech. The reconstructed speech is then postprocessed and upsampled (and upscaled). 
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Finally high-band signal is generated to the frequency band from 6 to 7 kHz. The signal flow at the decoder is shown in 
Figure 3. 

6.1 Decoding and speech synthesis 

The decoding process is performed in the following order: 

Decoding of LP filter parameters: The received indices of ISP quantization are used to reconstruct the quantized ISP 
vector. The interpolation described in Section 5.2.6 is performed to obtain 4 interpolated ISP vectors (corresponding to 
4 subframes). For each subframe, the interpolated ISP vector is converted to LP filter coefficient domain a^, which is 
used for synthesizing the reconstructed speech in the subframe. 

The following steps are repeated for each subframe: 

1 . Decoding of the adaptive codebook vector: The received pitch index (adaptive codebook index) 
is used to find the integer and fractional parts of the pitch lag. The adaptive codebook vector v(n) 
is found by interpolating the past excitation u(n) (at the pitch delay) using the FIR filter described 
in Section 5.7. The received adaptive filter index is used to find out whether the filtered adaptive 
codebook is Vi(n)= v(n) or V2(n) = 0.l8v(n) + 0Mv(n-l) + 0.l8v(n-2) . 

2. Decoding of the innovative vector: The received algebraic codebook index is used to extract the 
positions and amplitudes (signs) of the excitation pulses and to find the algebraic codevector c(n). 
If the integer part of the pitch lag is less than the subframe size 64, the pitch sharpening procedure 
is applied which translates into modifying c{n) by filtering it through the adaptive prefilter F(z) 
which consists of two parts: a periodicity enhancement part l/(l-0.85z~^) and a tilt part (1 - y^i z~'), 
where T is the integer part of the pitch lag and /^liti) is related to the voicing of the previous 
subframe and is bounded by [0.0,0.5]. 

3. Decoding of the adaptive and innovative codebook gains: The received index gives the fixed 
codebook gain correction factor y . The estimated fixed codebook gain g'^ is found as described 
in Section 5.8. First, the predicted energy for every subframe n is found by 

4 

E(n) = ^biR(n-i) (59) 

and then the mean innovation energy is found by 

r , N-i \ 



10 log 






yNU 



(60) 



The predicted gain g^ is found by 

The quantized fixed codebook gain is given by 

8c = k'c- (62) 

4. Computing the reconstructed speech: The following steps are for n = 0, ..., 63. The total 
excitation is constructed by: 

u{n)= gpV(n)+g^c(n), . (63) 

Before the speech synthesis, a post-processing of excitation elements is performed. 

5. Anti-sparseness processing (6.60 and 8.85 kbit/s modes): An adaptive anti-sparseness post- 
processing procedure is applied to the fixed codebook vector c(n) in order to reduce perceptual 
artifacts arising from the sparseness of the algebraic fixed codebook vectors with only a few non- 
zero samples per subframe. The anti-sparseness processing consists of circular convolution of the 
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fixed codebook vector with an impulse response. Three pre-stored impulse responses are used and 
a number impNr=Q,l,2 is set to select one of them. A value of 2 corresponds to no modification, a 
value of 1 corresponds to medium modification, while a value of corresponds to strong 
modification. The selection of the impulse response is performed adaptively from the adaptive and 
fixed codebook gains. The following procedure is employed; 

if gp < 0.6 then 

impNr = 0; 
else if gp < 0.9 then 

impNr = 1; 
else 

impNr = 2; 

Detect onset by comparing the fixed codebook gain to the previous fixed codebook gain. If the 
current value is more than three times the previous value an onset is detected. 

If not onset and impNr=0, the median filtered value of the current and the previous 4 adaptive 
codebook gains are computed. If this value is less than 0.6, impNr=Q. 

If not onset, the impNr-value is restricted to increase by one step from the previous subframe. 

If an onset is declared, the impNr -value is increased by one if it is less than 2. 

In case of 8.85 kbit/s mode, the impNr -value is increased by one. 

6. Noise enhancer: A nonlinear gain smoothing technique is applied to the fixed codebook gain 
g^ in order to enhance excitation in noise. Based on the stability and voicing of the speech 
segment, the gain of the fixed codebook is smoothed in order to reduce fluctuation in the energy of 
the excitation in case of stationary signals. This improves the performance in case of stationary 
background noise. 

The voicing factor is given by /?=0.5(l-r,,) with r^ME^,-E^I{Ey+E^, where Ey and E^ are the 
energies of the scaled pitch codevector and scaled innovation codevector, respectively. Note that 
since the value of r^ is between -1 and 1 , the value of /I is between and 1 . Note that the factor X 
is related to the amount of unvoicing with a value of for purely voiced segments and a value of 1 
for purely unvoiced segments. 

A stability factor ^is computed based on a distance measure between the adjacent LP filters. Here, 
the factor ^is related to the ISP distance measure and it is bounded by 0<6^1, with larger values 
of ^corresponding to more stable signals. 

Finally, a gain smoothing factor S„ is given by 

Sm = M (64) 

The value of S„ approaches 1 for unvoiced and stable signals, which is the case of stationary 
background noise signals. For purely voiced signals or for unstable signals, the value of Sir, 
approaches 0. 

An initial modified gain go is computed by comparing the fixed codebook gain g^ to a threshold 
given by the initial modified gain from the previous subframe, g.i. If g^ is larger or equal to g.i, 
then ^0 is computed by decrementing g^ by 1.5 dB bounded by ^o^ g-i- If gc is smaller than ^.i, 
then go is computed by incrementing g^ by 1.5 dB bounded by go< g.i. 

Finally, the gain is update with the value of the smoothed gain as follows 

Sc=S„go+(^-SJg,, (65) 

7. Pitch enhancer: A pitch enhancer procedure modifies the total excitation u(n) by filtering the 
fixed codebook excitation through an innovation filter whose frequency response emphasizes the 
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higher frequencies more than lower frequencies, and whose coefficients are related to the 
periodicity in the signal. A filter of the form 

Finno (2) = -CpeZ + 1 " C^^z"\ (66) 

where Cpe=0. 125(1+ r,,), with ry={Ey-E^I{E^+E^ as described above. The filtered fixed codevector 
is given by 

c'{n) = c(n) - Cp^{c(n + 1) + c(n - 1)). ( 67) 

and the updated excitation is given by 

u(n) = gpV{n) + g^c\n). (68) 

The above procedure can be done in one step by updating the excitation as follows 

u(n) = u(n)-g^Cp^{c(n + l) + c(n-l)). (69) 

8. Post-processing of excitation elements (6.60 and 8.85 kbit/s modes): A post-processing of 
excitation elements procedure is applied to the total excitation u{n) by emphasizing the 
contribution of the adaptive codebook vector: 

\u(n) + 0.25/]g^v(n), g ^ > 0.5 

«(«)= , , . ' , (70) 

[u(n), gp ^0.5 

Adaptive gain control (AGC) is used to compensate for the gain difference between the 
non-emphasized excitation u(n) and emphasized excitation u(n) The gain scaling factor //for the 
emphasized excitation is computed by: 



J7 = < 



> u (n) 

^"-0 3 >05 

,y63 ' Sp ., (71) 

1.0, gp<0.5. 



The gain-scaled emphasized excitation signal u'{n) is given by: 

u[n)=u[n)ri . (72) 

The reconstructed speech for the subframe of size 64 is given by 

16 

s{n) = u{n)—y a^s{n — i), n=0,...,63. (73) 

/■=i 

where a, are the interpolated LP filter coefficients. 

The synthesis speech s(n) is then passed through an adaptive postprocessing which is described in the following 
section. 

6.2 High-pass filtering, up-scaling and interpolation 

The high-pass filter serves as a precaution against undesired low frequency components. The signal is filtered through 
the high-pass filter //ai(z) and de-emphasis filter H^^ ^mpkiz}- 

Finally, the signal is upsampled to 16 kHz to obtain the lower band synthesis signal s^^^. (n) . .?igj (n) is produced by 
first upsampling the lower band synthesis ^i2.8A:(") ^^ 12.8 kHz by 5, then filtering the output through Hj^cimiz), and 
finally downsampling it by 4. 
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(Up-scaling consists of multiplying the output from the high-pass filtering by a factor of 2 in order to compensate the 
down-scaling at the pre-processing stage.) 



6.3 High frequency band 

For the higher frequency band (6.4 - 7.0 kHz), excitation is generated to model the highest frequencies. The high 
frequency content is generated by filling the upper part of the spectrum with a white noise properly scaled in the 
excitation domain, then converted to the speech domain by shaping it with a filter derived from the same LP synthesis 
filter used for synthesizing the down-sampled signal. 

6.3.1 Generation of Inigln-band excitation 

The high-band excitation is obtained by first generating white noise UuBiin). The power of the high-band excitation is 
set equal to the power of the lower band excitation U2{n) which means that 



63 / 63 



UHB2(n) = UHBi(n)J^'^2^(k) /'^'^HBi^(k) ■ (74) 

V k=0 I k=Q 

Finally the high-band excitation is found by 

UhbM = 8hb>^hb2M, (75) 

where g^g is a gain factor. 

In the 23.85 kbit/s mode, gf^^ is decoded from the received gain index. 

In 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85 and 23.05 kbit/s modes, gHB is estimated using voicing information 
bounded by [0.1,1.0]. First, tilt of synthesis e,,/, is found 

63 / 63 



emt=^hp(n)hp(n-i) /^hp^(n) (76) 



«— 1 / tt-O 

where i/,„(n) is high-pass filtered lower band speech synthesis ^i2 8/t(") with cut-off frequency of 400 Hz. The g^^g is 
then found by 

8hb = "^spSsp + (1 - "^sp^Sbc ' ( 77 ) 

where gsp = 1 - e,,/, is gain for speech signal, g^c = 1 ■'^5gsp is gain for background noise signal, and Wgp is a weighting 
function set to 1, when VAD is ON, and when VAD is OFF. g^^ is bounded between [0.1, 1.0]. In case of voiced 
segments where less energy is present at high frequencies, e,,;, approaches 1 resulting in a lower gain gfjB- This reduces 
the energy of the generated noise in case of voiced segments. 

6.3.2 LP filter for tine h\gh frequency band 
6.3.2.1 6.60 kbit/s mode 

The high-band LP synthesis filter A//g(z) is found by extrapolating the quantized ISF vector f into 20th order ISF vector 
f^. First, maximum of the autocorrelation C„aJ,i) of ISF vector difference vector f^(i) = f(i + l)- f(i),i = 1,...,14 is 
obtained. Then new 16kHz ISF vector f/{i) is computed by 

\f(i-l), i = l,..,l5 

f(i) = \ ; , , ■ (78) 

[/;(/-i)+/;(/-c_(o-i)-/;(/-c_(o-2), / = i6,..,i9 

An approximation of the last element of new ISF vector /^jg is updated based on lower frequency coefficients. New 
extrapolated ISF vector difference vector /^^ (/) is 
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/;a(o = c,,Af:(i) - na - d),/ = i6,...,i9 , (79 ) 

where c scale scales /^^(O so that/j,(19) will be equal to/^ig. In order to insure stability, fg^(i) is bounded by 

feAd) + feAd " D > 500,/ = 17,...,19 . (80) 

Finally, the extrapolated ISF vector f^ is obtained by 

7(0, / = i,...,i5 

/.(0=/;a(0 + /.('•-!), / = 16,...,19. (81) 

/(16), i = 20 

ie is converted to cosine domain to obtain q^ with 16000 Hz sampling rate. The high-band LP synthesis filter A/;b(z) is 
obtained by converting q^ to LP filter as described in 5.2.4 with m=20. 

6.3.2.2 8.85, 1 2.65, 1 4.25, 1 5.85, 1 8.25, 1 9.85, 23.05 or 23.85 kbit/s modes 

The high-band LP synthesis filter A//g(z) is weighted low-band LP synthesis filter 

AHBiz) = kyQ£), (82) 

where A(z} is the interpolated LP synthesis filter. A{z) has been computed analysing signal with the sampling rate of 

12.8 kHz but it is now used for a 16 kHz signal. Effectively, this means that the frequency response FRi(,{j) of Ahb{z} is 
obtained by 

FR,,{f) = FR,^,,{^f), (83) 

16 

where FRu.^if) is the frequency response of A(z). This means that the band 5.1 - 5.6 kHz in 12.8 kHz domain will be 
mapped to 6.4 - 7.0 kHz in 16 kHz domain. 

6.3.3 High band synthesis 

UHsin) is filtered through A//b(z). The output of this high-band synthesis SHsin) is filtered through a band-pass FIR filter 
Hhb(z) which has the passband from 6 to 7 kHz. Finally, shb is added to synthesized speech .?igj (n) to produce the 
synthesized output speech signal io„,„„, («) . 



7 Detailed bit allocation of the adaptive multi-rate 

wideband codec 

The detailed allocation of the bits in the adaptive multi-rate wideband speech encoder is shown for each mode in table 
12a-12i. These tables show the order of the bits produced by the speech encoder. Note that the most significant bit 
(MSB) of each codec parameter is always sent first. 
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Table 12a: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 477 bits/20 ms, 23.85 kbit/s mode 



Bits (MSB-LSB) 


Description 


s1 


VAD-flag 


s2-s9 


index of 1st ISP subvector 


S10-S17 


index of 2nd ISP subvector 


S18-S23 


index of 3rd ISP subvector 


s24 - s30 


index of 4th ISP subvector 


s31 -s37 


index of 5th ISP subvector 


s38 - s42 


index of 6th ISP subvector 


s43 - s47 


index of 7th ISP subvector 


subframe 1 


s48 - s56 


adaptive codebook index 


s57 


LTP-filtering-flag 


s58 - s68 


Codebook Indexl for track 1 


s69 - s79 


Codebook Indexl for track 2 


ssSO -s90 


Codebook Indexl for track 3 


S91-S101 


Codebook Indexl for track 4 


S102-S112 


Codebook Index2 for track 1 


S113-S123 


Codebook Index2 for track 2 


S124-S134 


Codebook Index2 for track 3 


si 35 -s1 45 


Codebook Index2 for track 4 


S146-S152 


codebook gains 


S153-S156 


High-band energy 


subframe 2 


S157-S162 


adaptive codebook index (relative) 


s163-s262 


same description as s57 - si 56 


subframe 3 


s263 - s371 1 same description as s48 - s1 56 


subframe 4 


s372 - s477 | same description as s1 57 - s262 
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Table 12b: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 461 bits/20 ms, 23.05 kbit/s mode 



Bits (MSB-LSB) 


Description 


s1 


VAD-flag 


s2-s9 


index of 1st ISP subvector 


S10-S17 


index of 2nd ISP subvector 


S18-S23 


index of 3rd ISP subvector 


s24 - s30 


index of 4th ISP subvector 


s31 -s37 


index of 5th ISP subvector 


s38 - s42 


index of 6th ISP subvector 


s43 - s47 


index of 7th ISP subvector 


subframe 1 


s48 - s56 


adaptive codebook index 


s57 


LTP-filtering-flag 


s58 - s68 


Codebook Indexl for track 1 


s69 - s79 


Codebook Indexl for track 2 


ssSO -s90 


Codebook Indexl for track 3 


S91-S101 


Codebook Indexl for track 4 


S102-S112 


Codebook Index2 for track 1 


S113-S123 


Codebook Index2 for track 2 


S124-S134 


Codebook Index2 for track 3 


si 35 -s1 45 


Codebook Index2 for track 4 


S146-S152 


codebook gains 


subframe 2 


S153-S158 


adaptive codebook index (relative) 


s159-s254 


same description as s57 - si 52 


subframe 3 


s255 - s359 


same description as s48 - si 52 


subframe 4 


s360 - s461 


same description as s153 - s254 
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Table 12c: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 397 bits/20 ms, 19.85 kbit/s mode 



Bits (MSB-LSB) 


Description 


s1 


VAD-flag 


s2-s9 


index of 1st ISP subvector 


S10-S17 


index of 2nd ISP subvector 


S18-S23 


index of 3rd ISP subvector 


s24 - s30 


index of 4th ISP subvector 


s31 -s37 


index of 5th ISP subvector 


s38 - s42 


index of 6th ISP subvector 


s43 - s47 


index of 7th ISP subvector 


subframe 1 


s48 - s56 


adaptive codebook index 


s57 


LTP-filtering-flag 


s58 - s67 


Codebook Indexl for track 1 


s68 - s77 


Codebook Indexl for track 2 


s78 - s79 


Pulse Selector for track 3 


s80 - s81 


Pulse Selector for track 4 


s82 - s91 


Codebook index2 for track 1 


S92-S101 


Codebook index2 for track 2 


S102-S115 


Codebook index for track 3 


S116-S129 


Codebook index for track 4 


s1 30 -si 36 


VQ gain 


subframe 2 


S137-S142 


adaptive codebook index (relative) 


s143-s222 


same description as s57 - si 36 


subframe 3 


s223 - s31 1 


same description as s48 - si 36 


subframe 4 


s312-s397 


same description as s1 37 - s222 
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Table 12d: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 365 bits/20 ms, 18.25 kbit/s mode 



Bits (MSB-LSB) 


Description 


s1 


VAD-flag 


s2-s9 


index of 1st ISP subvector 


S10-S17 


index of 2nd ISP subvector 


S18-S23 


index of 3rd ISP subvector 


s24 - s30 


index of 4th ISP subvector 


s31 -s37 


index of 5th ISP subvector 


s38 - s42 


index of 6th ISP subvector 


s43 - s47 


index of 7th ISP subvector 


subframe 1 


s48 - s56 


adaptive codebook index 


s57 


LTP-filtering-flag 


s58 - s59 


Pulse Selector for track 1 


s60-s61 


Pulse Selector for track 2 


s62 - s63 


Pulse Selector for track 3 


s64 - s65 


Pulse Selector for track 4 


s66 - s79 


Codebook index for track 1 


s80 - s93 


Codebook index for track 2 


S94-S107 


Codebook index for track 3 


S108-S121 


Codebook index for track 4 


s122-s128 


VQ gain 


subframe 2 


S129-S134 


adaptive codebook index (relative) 


S135-S206 


same description as s57 - s1 28 


subframe 3 


s207 - s287 


same description as s48 - s128 


subframe 4 


s288 - s365 


same description as s1 29 - s206 



Table 12e: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 317 bits/20 ms, 15.85 kbit/s mode 



Bits (MSB-LSB) 


Description 


s1 


VAD-flag 


s2-s9 


index of 1st ISP subvector 


S10-S17 


index of 2nd ISP subvector 


S18-S23 


index of 3rd ISP subvector 


s24 - s30 


index of 4th ISP subvector 


s31 -s37 


index of 5th ISP subvector 


s38 - s42 


index of 6th ISP subvector 


s43 - s47 


index of 7th ISP subvector 


subframe 1 


s48 - s56 


adaptive codebook index 


s57 


LTP-filtering-flag 


s58 - s70 


Codebook index for track 1 


s71 - s83 


Codebook index for track 2 


s84 - s96 


Codebook index for track 3 


S97-S109 


Codebook index for track 4 


S110-S116 


VQ gain 


subframe 2 


s117-s122 


adaptive codebook index (relative) 


S123-S182 


same description as s57 - si 16 


subframe 3 


S183-S251 


same description as s48 - si 16 


subframe 4 


S252-S317 


same description as s11 7 - si 82 
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Table 12f : Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 285 bits/20 ms, 14.25 kbit/s mode 



Bits (MSB-LSB) 


Description 


s1 


VAD-flag 


s2-s9 


index of 1st ISP subvector 


S10-S17 


index of 2nd ISP subvector 


S18-S23 


index of 3rd ISP subvector 


s24 - s30 


index of 4th ISP subvector 


s31 -s37 


index of 5th ISP subvector 


s38 - s42 


index of 6th ISP subvector 


s43 - s47 


index of 7th ISP subvector 


subframe 1 


s48 - s56 


adaptive codebook index 


s57 


LTP-filtering-flag 


s58 - s70 


Codebook index for track 1 


s71 - s83 


Codebook index for track 2 


s84 - s92 


Codebook index for track 3 


S93-S101 


Codebook index for track 4 


S102-S108 


VQ gain 


subframe 2 


S109-S114 


adaptive codebook index (relative) 


S115-S166 


same description as s57 - s1 08 


subframe 3 


S167-S227 


same description as s48 - s1 08 


subframe 4 


s228 - s285 


same description as s1 09 - s1 66 



Table 12g: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 253 bits/20 ms, 12.65 kbit/s mode 



Bits (MSB-LSB) 


Description 


s1 


VAD-flag 


s2-s9 


index of 1st ISP subvector 


S10-S17 


index of 2nd ISP subvector 


S18-S23 


index of 3rd ISP subvector 


s24 - s30 


index of 4th ISP subvector 


s31 -s37 


index of 5th ISP subvector 


s38 - s42 


index of 6th ISP subvector 


s43 - s47 


index of 7th ISP subvector 


subframe 1 


s48 - s56 


adaptive codebook index 


s57 


LTP-filtering-flag 


s58 - s66 


Codebook index for track 1 


s67 - s75 


Codebook index for track 2 


s76 - s84 


Codebook index for track 3 


s85 - s93 


Codebook index for track 4 


s94-s100 


VQ gain 


subframe 2 


s101 -s106 


adaptive codebook index (relative) 


S107-S150 


same description as s57 - si 00 


subframe 3 


s151 -s203 


same description as s48 - si 00 


subframe 4 


s204 - s253 


same description as si 01 - si 50 
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Table 12h: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 177 bits/20 ms, 8.85 kbit/s mode 



Bits (MSB-LSB) 


Description 


s1 


VAD-flag 


s2-s9 


index of 1st ISP subvector 


S10-S17 


index of 2nd ISP subvector 


S18-S23 


index of 3rd ISP subvector 


s24 - s30 


index of 4th ISP subvector 


s31 -s37 


index of 5th ISP subvector 


s38 - s42 


index of 6th ISP subvector 


s43 - s47 


index of 7th ISP subvector 


subframe 1 


s48 - s55 


adaptive codebook index 


s56 - s60 


Codebook index for track 1 


s61 - s65 


Codebook index for track 2 


s66 - s70 


Codebook index for track 3 


s71 - s75 


Codebook index for track 4 


s76 - s81 


VQ gain 


subframe 2 


s82 - s86 


adaptive codebook index (relative) 


s87-s112 


same description as s56 - s81 


subframe 3 


s113-s146 


same description as s48 - s81 


subframe 4 


s147-s177 


same description as s82 - s1 12 



Table 12i: Source encoder output parameters in order of occurrence and bit allocation within the 

speech frame of 132 bits/20 ms, 6.60 kbit/s mode 



Bits (MSB-LSB) 


Description 


si 


VAD-flag 


s2-s9 


index of 1st ISP subvector 


S10-S17 


index of 2nd ISP subvector 


S18-S24 


index of 3rd ISP subvector 


s25 - s31 


index of 4th ISP subvector 


s32 - s37 


index of 5th ISP subvector 


subframe 1 


s38 - s45 


adaptive codebook index 


s46 - 57 


Codebook Index 


s58 - s63 


VQ gain 


subframe 2 


s64 - s68 


adaptive codebook index (relative) 


s69 - s86 


same description as s46 - s63 


subframe 3 


S87-S109 


same description as s64 - s86 


subframe 4 


S110-S132 


same description as s64 - s86 



8 Homing sequences 

8.1 Functional description 

The adaptive multi-rate wideband speech codec is described in a bit-exact arithmetic to allow easy type approval as well 
as general testing of correct operation of the adaptive multi-rate wideband speech codec. 

The response of the codec to a predefined input sequence can only be foreseen if the internal state variables of the codec 
are in a predefined state at the beginning of the experiment. Therefore, the codec has to be put in a so called home state 
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before a bit-exact test can be performed. This is usually done by a reset (a procedure in which the internal state 
variables of the codec are set to their defined initial values). The codec mode of the speech encoder and speech decoder 
shall be set to the tested codec mode by external means at reset. 

To allow a reset of the codec in remote locations, special homing frames have been defined for the encoder and the 
decoder, thus enabling a codec homing by inband signalling. 

The codec homing procedure is defined in such a way, that in either direction (encoder or decoder) the homing 
functions are called after the processing of the homing frame. The output corresponding to the first homing frame is 
therefore dependent on the used codec mode and the codec state when receiving that frame and hence usually not 
known. The response of the encoder to any further homing frame is by definition the corresponding decoder homing 
frame for the used codec mode. The response of the decoder to any further homing frame is by definition the encoder 
homing frame. This procedure allows homing of both the encoder and decoder from either side, if a loop back 
configuration is implemented, taking proper framing into account. 

8.2 Definitions 

Encoder homing frame: The encoder homing frame consists of 320 identical samples, each 13 bits long, with the least 
significant bit set to "one" and all other bits set to "zero". When written to 16-bit words with left justification, the 
samples have a value of 0008 hex. The speech decoder has to produce this frame as a response to the second and any 
further decoder homing frame if at least two decoder homing frames were input to the decoder consecutively. The 
encoder homing frame is identical for all codec modes. 

Decoder homing frame: There exist nine different decoder homing frames, which correspond to the nine AMR-WB 
codec modes. Using one of these codec modes, the corresponding decoder homing frame is the natural response of the 
speech encoder to the second and any further encoder homing frame if at least two encoder homing frames were input 
to the encoder consecutively. In [4], for each decoder homing frame the parameter values are given. 



8.3 Encoder homing 



Whenever the adaptive multi-rate wideband speech encoder receives at its input an encoder homing frame exactly 
aligned with its internal speech frame segmentation, the following events take place: 

Step 1: The speech encoder performs its normal operation including VAD and SCR and produces in 

accordance with the used codec mode a speech parameter frame at its output which is in general 
unknown. But if the speech encoder was in its home state at the beginning of that frame, then the 
resulting speech parameter frame is identical to that decoder homing frame, which corresponds to 
the used codec mode (this is the way how the decoder homing frames were constructed). 

Step 2: After successful termination of that operation the speech encoder provokes the homing functions 

for all sub-modules including VAD and SCR and sets all state variables into their home state. On 
the reception of the next input frame, the speech encoder will start from its home state. 

NOTE: Applying a sequence of N encoder homing frames will cause at least N-1 decoder homing frames at the 
output of the speech encoder. 



8.4 Decoder homing 



Whenever the speech decoder receives at its input a decoder homing frame, which corresponds to the used codec mode, 
then the following events take place: 

Step 1: The speech decoder performs its normal operation and produces a speech frame at its output which 

is in general unknown. But if the speech decoder was in its home state at the beginning of that 
frame, then the resulting speech frame is replaced by the encoder homing frame. This would not 
naturally be the case but is forced by this definition here. 

Step 2: After successful termination of that operation the speech decoder provokes the homing functions 

for all sub-modules including the comfort noise generator and sets all state variables into their 
home state. On the reception of the next input frame, the speech decoder will start from its home 

state. 
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NOTE 1: Applying a sequence of N decoder homing frames will cause at least N-1 encoder homing frames at the 
output of the speech decoder. 

NOTE 2: By definition (!) the first frame of each decoder test sequence must differ from the decoder homing frame 
at least in one bit position within the parameters for LPC and first subframe. Therefore, if the decoder is 
in its home state, it is sufficient to check only these parameters to detect a subsequent decoder homing 
frame. This definition is made to support a delay-optimized implementation in the TRAU uplink 
direction. 
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Figure 1 Simplified block diagram of the CELP synthesis model 
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Figure 2 Detailed block diagram of the ACELP encoder 
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Figure 3 Detailed block diagram of the ACELP decoder 
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